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OM nucleic - nucleic search, using sw model 

Run on: March 25, 2004, 15:56:05 ; Search time 3413 Seconds 

(without alignments) 
9921.979 Million cell updates/sec 

Title: US- 10-091-628-1 

Perfect score: 1134 

Sequence: 1 atgagagccaattgttccag . . acatcacttcatgtgaatag 1134 



Scoring table: 



IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 



Searched: 27513289 seqs, 14931090276 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



55026578 
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em_estba : * 
em_esthum: * 
em_estin: * 
em_estmu: * 
em_estov: * 
em_estpl : * 
em_estro : * 
em_htc: * 
gb_estl : * 
gb_est2:* 
gb_htc: * 
gb__est3 : * 
gb_est4 : * 
gb_est5 : * 
em_estf un : * 
em^estom: * 
em_gss__hum: * 
em_gss_inv: * 
em_gss_pln: * 
em gss vrt : * 
em_gss_fun: * 
em_gss_mam: * 
em_gss_mus : * 
em_gss_pro : * 
em_gss_rod: * 
em_gss_phg: * 
em_gss_vrl : * 



28: gb_gssl:* 
29: gb_gss2:* 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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AK018423 
LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
MEDLINE 
PUBMED 

REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
REFERENCE 



Craniata; Vertebrata; Euteleos tomi ; 
Sciurognathi; Muridae; Murinae; Mus. 



AK018423 2125 bp mRNA linear HTC 20-SEP-2003 

Mus musculus 16 days embryo lung cDNA, RIKEN full-length enriched 
library, clone : 8430417G17 product : hypothetical Sodium bile acid 
symporter containing protein, full insert sequence. 
AK018423 

AK018423.1 GI: 12858114 
HTC; CAP trapper. 
Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Rodentia; 
1 

Carninci,P. and Hayashizaki, Y . 

High-efficiency full-length cDNA cloning 

Meth. Enzymol. 303, 19-44 (1999) 

99279253 

10349636 

2 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 

Itoh,M., Konno,H. f Okazaki,Y., Muramatsu,M. and Hayashizaki r Y. 

Normalization and subtraction of cap-trapper-selected cDNAs to 

prepare full-length cDNA libraries for rapid discovery of new genes 

Genome Res. 10 (10) , 1617-1630 (2000) 

20499374 

11042159 

3 

Shibata,K., Itoh,M., Aizawa,K., Nagaoka,S., Sasaki, N., Carninci,P., 
Konno,H., Akiyama,J., Nishi,K., Kitsunai,T., Tashiro,H., Itoh,M., 
Sumi,N., Ishii,Y., Nakamura, S. , Hazama,M w Nishine,T., Harada,A., 
Yamamoto,R., Mat sumo to, H . , Sakaguchi, S . , Ikegami,T., Kashiwagi, K . , 
Fujiwake,S., Inoue,K., Togawa,Y., Izawa,M. , Ohara,E., Watahiki,M., 
Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., Matsuura,S., Kawai,J., 
Okazaki,Y., Muramatsu,M. , Inoue,Y., Kira,A. and Hayashizaki, Y. 
RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer 
Genome Res. 10 (11), 1757-1771 (2000) 
20530913 
11076861 
4 

The RIKEN Genome Exploration Research Group Phase II Team and the 
FANTOM Consortium. 

Functional annotation of a full-length mouse cDNA collection 

Nature 409, 685-690 (2001) 

5 

The FANTOM Consortium and the RIKEN Genome Exploration Research 
Group Phase I & II Team. 

Analysis of the mouse transcriptome based on functional annotation 
of 60,770 full-length cDNAs 
Nature 420, 563-573 (2002) 
6 (bases 1 to 2125) 



AUTHORS 



TITLE 
JOURNAL 



COMMENT 



Adachi,J., Aizawa,K., Akahira,S. 
Arakawa,T., Bono,H., Carninci,P. 
Furuno,M., Hanagaki,T., Hara,A., 
Hiraoka,T., Hori,F., Imotani, K. , 
Kasukawa,T., Kato,H., Kawai,J., 



FEATURES 

source 



CDS 



, Akimura,T., Arai, A. , Aono,H., 
, Fukuda,S., Fukunishi, Y. , 
Hayatsu,N., Hiramoto, K. , 
Ishii,Y., Itoh,M., Izawa,M. , 
Kojima,Y., Konno,H., Kouda,M. , 



Koya,S., Kurihara,C, Matsuyama, T . , Miyazaki,A., Nishi,K., 
Nomura, K., Numazaki,R., Ohno,M., Okazaki,Y., Okido,T., Owa,C, 
Saito,H., Saito,R., Sakai,C, Sakai,K., Sano,H., Sasaki, D., 
Shibata,K., Shibata,Y., Shinagawa, A. , Shiraki,T., Sogabe,Y., 
Suzuki, H. , Tagami,M. , Tagawa,A. , Takahashi,F. , Tanaka,T. , 
Tejima,Y., Toya,T., Yamamura, T . , Yasunishi, A. , Yoshida,K., 
Yoshino,M., Muramatsu, M. and Hayashizaki, Y. 
Direct Submission 

Submitted ( 10- JUL-2000 ) Yoshihide Hayashizaki, The Institute of 
Physical and Chemical Research (RIKEN), Laboratory for Genome 
Exploration Research Group, RIKEN Genomic Sciences Center (GSC) , 
RIKEN Yokohama Institute; 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 
Kanagawa 230-0045, Japan (E-mail : genome-res@gsc . riken. go. jp, 
URL :http: //genome. gsc. riken. go. jp/, Tel : 81-45-503-9222 , 
Fax:81-45-503-9216) 

Please visit our web site (http://genome.gsc.riken.go.jp/) for 
further details . 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. First strand cDNA was primed with a primer 
[5' GAGAGAGAGAAGGATCCAAGAGCTCTTTTTTTTTTTTTTTTVN 3'], cDNA was 
prepared by using trehalose thermo-activated reverse transcriptase 
and subsequently enriched for full-length by cap-trapper. cDNA went 
through one round of normalization to Rot = 10.0 and subtraction to 
Rot = 185.2. Second strand cDNA was prepared with the primer 
adapter of sequence [5' 

GAGAGAGAGATTCTCGAGTTAATTAAATTAATCCCCCCCCCCCCC 3']. cDNA was cleaved 
with BamHI and Xhol . Vector: a modified pBluescript KS(+) after 
bulk excision from Lambda FLC I. Cloning sites, 5 1 end: Sail; 3' 
end: BamHI. Host: DH10B. 

Location/Qualifiers 

1. .2125 

/organism="Mus musculus" 
/mol_type="mRNA" 
/strain="C57BL/6J" 
/db_xref="FANTOM_DB: 8430417G17" 
/db_xref="MGI : 1909149" 
/db_xref="taxon: 10090" 
/clone="8430417G17" 
/ tissue_type="lung" 

/clone_lib="RIKEN full-length enriched mouse cDNA library" 
/dev_stage="16 days embryo" 
173. .1294 

/note="unnamed protein product; hypothetical Sodium bile 

acid symporter containing protein ( InterPro | IPR002657 , 

evidence: InterPro) 

putative" 

/codon_start-l 

/protein_id-"BAB31203.1" 

/db xref="GI: 12858115" 



/translation="MSTDCAGNSTCPVNSTEEDPPVGMEGHANLKLLFTVLSAVMVGL 
VMFSFGCSVESQKLWLHLRRPWGIAVGLLSQFGLMPLTAYLLAIGFGLKPFQAIAVLM 
MGSCPGGTISNVLTFWVDGDMDLSISMTTCSTVAALGMMPLCLYIYTRSWTLTQNLVI 
PYQSIGIT L VS L W P VAS G VYVN Y RW P KQ AT V I L KVGAI L G GML L L WAVT GMVLAKG 
WNTDVTLLVISCIFPLVGHVTGFLLAFLTHQSWQRCRTISIETGAQNIQLCIAMLQLS 
FSAEYLVQLLNFALAYGLFQVLHGLLIVAAYQAYKRRQKSKCRRQHPDCPDVCYEKQP 
RETSAFLDKGDEAAVTLGPVQPEQHHRAAELTSHIPSCE" 



ORIGIN 



Query Match 62.1%; Score 704.4; DB 11; Length 2125; 

Best Local Similarity 77.7%; Pred. No. 6.1e-183; 

Matches 881; Conservative 0; Mismatches 241; Indels 12; Gaps 2; 

Qy 1 AT G AGAGC C AAT T GT T C C AGC AG CT C AGC CT GCC CT GC C AAC AGT T C AGAG GAG GAGCT G 60 

I I I I I ! I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 173 AT GAGCAC AGACT GT GCGG GC AACT C CACCTG CC CT GT CAAC AGTAC G GAGGAAGAC C C G 232 

Qy 61 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 

Db 233 CCCGTGGGAATGGAGGGCCATGCGAATCTAAAGCTGCTTTTTACAGTGCTCTCGGCTGTG 2 92 

Qy 121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 293 ATGGTGGGTTTGGTCATGTTCTCTTTTGGATGTTCTGTGGAGAGTCAGAAGCTCTGGTTG 352 

Qy 181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 240 

III I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I M I I M I I I I I I 

Db 353 CACCTCAGAAGACCCTGGGGCATCGCAGTGGGCCTGCTTTCCCAGTTTGGACTTATGCCT 412 

Qy 241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 3 00 

I I I I I I I I I I II I I I I I I I MM I I I I I I Ml I II I I I I I I I I I I I I I 

Db 413 CTGACAGCTTATCTGTTAGCCATTGGCTTCGGTCTGAAACCATTCCAAGCTATTGCTGTC 472 

Qy 301 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 360 

Mill I I M I M I I I II I I II II I II I M I I I I I M I II M I I I I I I I I I I I I 

Db 473 CTCATGATGGGGAGCTGCCCTGGGGGCACCATCTCTAATGTTCTCACCTTCTGGGTTGAT 532 

Qy 361 GGAGATAT GGATCT CAGCAT CAGTATGACAACCTGTT CCACCGTGGCCGCCCTGGGAAT G 420 

I II M M I M I I M I M I M I M I I M II M M M M I M I II I I I I I I I I I I I I I I I I 

Db 533 GGAGATAT GGATCT CAGCAT CAGTATGACAACCT GTT CCACAGT GGCCGCCCTGGGAATG 592 

Qy 421 AT GCCACT CT GCATT TAT CT CTACACCT GGT CCT GGAGT CTT CAGCAGAAT CT CACCATT 4 80 

II II I I I I I I I I I I I I I I II I I I I M I I I I I M I II M I I I I I I I I 

Db 593 AT GCCTCTCTGCCT CT AC AT CT AC AC CCGGTCCTG GACT C T GAC AC AGAAC CT C GT CAT T 652 

Qy 4 81 CCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTC 540 

II M II I II M II I I I I M I M I I II I I I I II I II II II M I I I III Ml 

Db 653 CCGTATCAGAGCATAGGAATTACCCTTGTGTCCCTGGTGGTTCCTGTGGCTTCTGGCGTC 712 

Qy 541 T AT GT GAATTACAGAT GGCCAAAACAAT CCAAAAT CATT CT CAAGATT GGGGCC GTT GTT 600 

II I II I II I II I I I I I I I M I I I I I I I I I I II II I II I I I I I I I I I 
Db 713 TAT GT GAATT AT AGGT GGC CAAAGCAAGCAACGGT C ATTCT CAAGGT CGGAGCCATT CT G 772 

Qy 601 GGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGG 660 

I II I I I II I I I I I I I I II I II M I I I II I I I I I I II II I I I M I M I 

Db 773 GGTGGCATGCTCCTCCTGGTGGTGGCAGTTACTGGCATGGTCCTGGCAAAAGG CTGG 829 



Qy 661 AAT T C AG AC AT C AC C C T T C T G AC CAT C AGT T T CAT CTTTCCTTT GAT T G G C CAT GT C AC G 720 

M I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 830 AACACAGACGTCACTCTTCTGGTCATCAGCTGCATTTTCCCCTTGGTCGGCCATGTCACA 889 

Qy 721 GGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTA 780 

I I I I I I I I I I I! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 890 GGCTTCCTGCTGGCATTCCTCACCCACCAATCTTGGCAAAGGTGCAGGACCATTTCCATA 94 9 

Qy 7 81 G AAAC T G GAG C T C AG AAT AT T C AG AT GT G CAT C AC CAT G C T C C AGT TAT C T T T C AC T G C T 84 0 

II I I I I I MINIM II Ml I I I I I I I I I I I I I I I III I II III Mill 

Db 950 GAGACTGGCGCTCAGAACATCCAGCTGTGCATCGCCATGCTGCAGCTGTCCTTCTCTGCT 1009 

Qy 841 GAGC ACT T G GT C C AG AT GT T GAGT T T C C C ACT GG C CT AT G G ACT CT T C C AGCT GAT AG AT 900 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I 

Db 1010 GAGTACCTGGTCCAGCTGCTAAACTTTGCATTGGC'CTATGGACTCTTCCAAGTGCTGCAC 1069 

Qy 901 G GAT T T C T TAT T GT T GC AG CAT AT C AGAC GT AC AAG AG GAG AT T GAAGAAC AAAC AT G GA 960 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Ml I 

Db 1070 GGGCTGCTCATTGTCGCAGCATATCAGGCATACAAGAGGAGGCAGAAGAGTAAATGCAGG 112 9 

Qy 961 AAAAAGAACT CAGGTT GCAC AGAAGT CTGC CAT AC GAGGAAAT C GACT T CTT C CAGAGAG 102 0 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 

Db 1130 AGACAGCACCCGGATTGCCCAGACGTCTGCTACGAGAAGCA G C C CAGAGAG 1180 

Qy 1021 ACCAATGCCTTCTTGGAGGTGAATGAAGAAGGTGCCATCACTCCTGGGCCACCAGGGCCA 1080 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1181 ACCAGTGCTTTCTTGGATAAAGGGGATGAGGCTGCCGTAACTCTGGGGCCAGTGCAGCCA 124 0 

Qy 1081 ATGGATTGCCACAGGGCTCTCGAGCCAGTTGGCCACATCACTTCATGTGAATAG 1134 

II I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I 

Db 12 41 GAG CAGCAC CACAGGG CT G CT GAG CT GACT AGC C ACAT T C C TT CAT GT GAAT AG 12 94 
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KEYWORDS 
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BB613812 652 bp mRNA linear EST 26-OCT-2001 

BB613812 RIKEN full-length enriched, 0 day neonate head Mus 
musculus cDNA clone 4831431E11 5', mRNA sequence. 
BB613812 

BB613812.1 GI:16454310 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 652) 

Arakawa,T., Carninci,?., Fukuda,S., Furuno,M., Hanagaki,T., 
Hara,A., Hiramoto,K., Hori,F., Ishii,Y., Ito,M., Kawai,J., 
Konno,H., Kouda,M., Koya,S., Matsuyama, T . , Miyazaki,A. , Nomura, K., 
Ohno,M., Okazaki,Y., Okido,T., Saito,R., Sakai,C, Sakai,K., 
Sano,H., Sasaki, D., Shibata,K., Shinagawa,A. , Shiraki,T., 
Sogabe,Y., Suzuki, H., Tagami,M., Tagawa,A., Takahashi, F. , 
Takeda,Y., Tanaka,T., Toya,T., Muramatsu,M. and Hayashizaki , Y . 
RIKEN Mouse ESTs (Arakawa,T., et al..2001) 
Unpublished (2001) 
Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 



Sciences Center (GSC) , Yokohama Institute 

The Institute of Phys ical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome- res @gs c . riken. go. jp, 

URL : http : / / genome . gsc . riken . go . jp/ 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 
Itoh,M., Konno,H., Okazaki,Y., Muramatsu, M. and Hayashizaki, Y. 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. . 10 (10), 1617-1630 (2000) 

wagi,K., Fuj iwake, S . , Inoue,K., Togawa,Y., Izawa,M., Ohara,E., 
Watahiki,M., Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., 
Matsuura,S., Kawai,J., Okazaki,Y., Muramatsu, M. , Inoue,Y., Kira,A. 
and Hayashizaki, Y. 

RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. . 
10 (11), 1757-1771 (2000) 

Konno,H., Fukunishi , Y . , Shibata,K., Itoh,M., Carninci,P., 
Sugahara,Y. and Hayashizaki, Y. 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. . 11 (2), 281-289 (2001) 

Kondo,S., Shinagawa, A. , Saito,T., Kiyosawa,H., Yamanaka,I., 
Aizawa,K., Fukuda,S., Hara,A., Itoh,M., Kawai,J., Shibata,K. and 
Hayashizaki, Y. 

Computational Analysis of Full-Length Mouse cDNAs Compared with 
Human Genome Sequences. Mamm. Genome. 12, 673-677 (2001) 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details, 
e mouse tissues. 
FEATURES Location/Qualifiers 
source 1. .652 

/organism="Mus musculus" 

/mol_type= ,, mRNA" 

/strain="C57BL/6J" 

/db_xref="taxon: 10090" 

/clone="4831431Ell" 

/sex="mixed" 

/ tissue_type="head" 

/dev_stage="0 day neonate" 

/lab_host="DH10B" 

/clone_lib="RIKEN full-length enriched, 0 day neonate 
head" 

/note="Site_l : Sail; Site_2: BamHI; cDNA library was 
prepared and sequenced in Mouse Genome Encyclopedia 
Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in 
RIKEN. Division of Experimental Animal Research in Riken 
contributed to prepare mouse tissues. 1st strand cDNA was 
primed with a primer [5 1 

GAGAGAGAGAAGGATCCAAGAGCTCTTTTTTTTTTTTTTTTVN 3 1 ], cDNA was 
prepared by using trehalose thermo-activated reverse 
transcriptase and subsequently enriched for full-length by 
cap-trapper. cDNA went through one round of normalization 
to Rot = 10.0 and subtraction to Rot = 100.0. Second 



strand cDNA was prepared with the primer adapter of 
sequence [5 1 GAGAGAGAGAT T CT C GAGT T AAT T AAAT T AAT CCCCCCCCCCCCC 
3']- cDNA was cloned into the Xhol and BamHI sites. 
Vector: a modified pBluescript KS(+.) after bulk excision 
from Lambda FLC I." 

ORIGIN 

Query Match 39.7%; Score 450.4; DB 10; Length 652; 

Best Local Similarity 81.7%; Pred. No. 5.7e-113; 

Matches 533; Conservative 0; Mismatches 116; Indels 3; Gaps 1; 

Qy 92 AGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGTTCTCTTTGGGAT 151 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 4 AGCTGCTTTTTACAGTGCTCTCGGCTGTGATGGTGGGTTTGGTCATGTTCTCTTTTGGAT 63 

Qy 152 GTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGGGCATTGCTGTGG 211 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 64 GTTCTGTGGAGAGTCAGAAGCTCTGGTTGCACCTCAGAAGACCCTGGGGCATCGCAGTGG 123 

Qy 212 GACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGGCCATTAGCTTTT 271 

I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I 

Db 124 GCCTGCTTTCCCAGTTTGGACTTATGCCTCTGACAGCTTATCTGTTAGCCATTGGCTTCG 183 

Qy 272 CTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCCCGGGGGGCACCA 331 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I 

Db 184 GTCTGAAACCATTCCAAGCTATTGCTGTCCTCATGATGGGGAGCTGCCCTGGGGGCACCA 243 

Qy 332 T CT CTAACATTTT CACCTT CT GGGTT GAT GGAGAT AT GGATCT CAGCATCAGTAT GACAA 391 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 11 I I I I I I I I I ! I I I I I I I I I I I I I I 
Db 244 TCT CTAAT GT T CT CACCTT CT GGGTT GAT GGAGAT AT GGAT CT CAGCAT CAGTAT GACAA 303 

Qy 392 CCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATCTCTACACCTGGT 451 

I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 304 CCTGTTCCACAGTGGCCGCCCTGGGAATGATGCCTCTCTGCCTCTACATCTACACCCGGT 363 

Qy 4 52 CCT GGAGT CTT CAGCAGAAT CT CACCATT C CTT AT CAGAACAT AGGAATT ACC CTT GT GT 511 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 3 64 CCT GGACT CT GACACAGAAC CT C GT CATT C C GTAT C AGAG C AT AGGAAT T ACC CTT GT GT 423 

Qy 512 GC CT GAC CAT T C CT GT GGC C TT T GGT GT CT AT GT GAAT T AC AGAT GG C C AAAAC AAT C C A 571 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 424 CCCTGGTGGTTCCTGTGGCTTCTGGCGTCTATGTGAATTATAGGTGGCCAAAGCAAGCAA 4 83 

Qy 572 AAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTG 631 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 4 84 CGGTCATTCTCAAGGTCGGAGCCATTCTGGGTGGCATGCTCCTCCTGGTGGTGGCAGTTA 543 

Qy 632 CTGGTGTGGTCCTGGC GAAAG GAT CT T GGAAT T C AGAC AT CAC C CT T CT GAC CAT C AGT T 691 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 544 CTGGCATGGTCCTGGCAAAAGG CT GGAAC ACAGAC GT CACT CT T CT GGT CAT C AGCT 600 

Qy 692 TCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTTTTAC 74 3 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
Db 601 GCATTTTCCCCTTGGTCGGCCATGTCACAGGCTTCCTGCTGGGCATTCTCAC 652 



RESULT 3 



BY720552 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
COMMENT 



BY720552 952 bp mRNA linear EST 17-DEC-2002 

BY720552 RIKEN full-length enriched, 16 days embryo lung Mus 
musculus cDNA clone 84304 17G17 5', mRNA sequence. 
BY720552 

BY720552.1 GI:27133669 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodent ia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 952) 

Okazaki,Y., Furuno,M., Kasukawa,T., Adachi,J., Bono,H., Kondo,S., 
Nikaido,I., Osato,N., Saito,R., Suzuki, H., Yamanaka,I., 
Kiyosawa,H., Yagi,K., Tomaru,Y., Hasegawa,Y., Nogami,A., 
Schonbach, C. , Gojobori, T . , Baldarelli, R. , Hill, D. P., Bult,C, 
Hume, D. A., Quackenbush, J . , Schriml, L .M. , Kanapin,A., Matsuda,H., 
Batalov,S., Beisel,K.W., Blake, J. A., Bradt,D., Brusic,V., 
Chothia,C, Corbani , L . E . , Cousins, S., Dalla,E., Dragani, T . A. , 
Fletcher, C.F. , Forrest, A. , Frazer,K.S. , Gaasterland, T . , 
Gariboldi,M. , Gissi,C, Godzik,A., Gough,J., Grimmond,S., 
Gustincich, S . , Hirokawa,N., Jackson, I . J. , Jarvis,E.D., Kanai,A., 
Kawaji,H., Kawasawa,Y., Kedzierski, R.M. , King,B.L., Konagaya,A., 
Kurochkin, I . V. , Lee,Y., Lenhard,B., Lyons, P. A., Maglott , D . R. , 
Maltais,L., Marchionni, L . , McKenzie,L., Miki,H., Nagashima, T . , 
Numata,K., Okido,T., Pavan,W.J., Pertea,G., Pesole,G., 
Petrovsky,N. , Pillai,R., Pontius, J. U. , Qi, D. , Ramachandran, S . , 
Ravasi,T., Reed, J. C, Reed, D. J., Reid,J., Ring,B.Z., Ringwald,M., 
Sandelin,A., Schneider , C . , Semple,C.A., Setou,M. , Shimada,K., 
Sultana, R. , Takenaka,Y., Taylor, M.S., Teasdale, R. D . , Tomita,M., 
Verardo,R., Wagner, L., Wahlestedt , C . , Wang,Y., Watanabe,Y., 
Wells, C, Wilming,L.G. , Wynshaw-Boris , A. , Yanagisawa,M. , Yang, I., 
Yang,L., Yuan,Z., Zavolan,M., Zhu,Y., Zimmer,A., Carninci,P., 
Hayatsu,N. , Hirozane-Kishikawa, T . , Konno,H. , Nakamura,M. , 
Sakazume,N., Sato,K., Shiraki,T., Waki,K., Kawai,J., Aizawa,K., 
Arakawa,T., Fukuda,S., Hara,A., Hashizume, W. , Imotani,K., Ishii,Y., 
Itoh,M., Kagawa,I., Miyazaki,A., Sakai,K., Sasaki, D., Shibata,K., 
Shinagawa,A. , Yasunishi , A. , Yoshino,M., Waterston, R. , Lander, E.S., 
Rogers, J., Birney,E. and Hayashizaki , Y . 

Analysis of the mouse transcriptome based on functional annotation 

of 60,770 full-length cDNAs 

Nature 420, 563-573 (2002) 

22354683 

12466851 

Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 
Sciences Center (GSC), Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome-res@gsc . riken . go . jp, 

URL : http : / / genome . gs c . ri ken . go . j p/ 

Adachi,J., Aizawa,K., Akimura,T., Arakawa,T., Carninci,P., 
Fukuda,S., Hashizume, W. , Hayashida, K. , Hirozane,T., Hori,F., 
Imotani,K., Ishii,Y., Itoh,M., Kagawa,I., Kawai,J., Kojima,Y., 
Kondo,S., Konno,H., Koya,S., Miyazaki,A. , Murata,M. , Nakamura,M., 



Nomura, K., Numazaki, R. , Ohno,M., Ohsato,N., Saito,R., Sakazume,N., 
Sano,H., Sasaki f D., Sato,K. f Shibata,K., Shiraki,T., Tagami,M., 
Takeda,Y., Waki,K., Watahiki,A., Muramatsu,M. and Hayashizaki, Y. 
Direct Submission 

Computational Analysis of Full-Length Mouse cDNAs Compared with 
Human Genome Sequences Mamm. Genome. 12, 673-677 (2001) 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. 10 (10), 1617-1630 (2000) 

RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. 
10 (11), 1757-1771 (2000) 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. 11 (2), 281-289 (2001) 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details. 
FEATURES Location/Qualifiers 
source 1. .952 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="C57BL/6J" 

/db_xref="taxon: 10090" 

/clone="8430417G17" 

/sex="mixed" 

/ tissue_type =,, lung fl 

/dev_jstage="16 days embryo" 

/lab_host="DH10B" 

/clone_lib="RIKEN full-length enriched, 16 days embryo 
lung" 

/note="Site_l: Sail; Site_2 : BamHI ; cDNA library was 
prepared and sequenced in Mouse Genome Encyclopedia 
Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in 
RIKEN. Division of Experimental Animal Research in Riken 
contributed to prepare mouse tissues. 1st strand cDNA was 
primed with a primer [5 1 

GAGAGAGAGAAGGATCCAAGAGCTCTTTTTTTTTTTTTTTTVN 3'], cDNA was 
prepared by using trehalose thermo-activated reverse 
transcriptase and subsequently enriched for full-length by 
cap-trapper. cDNA went through one round of normalization 
to Rot =10.0 and subtraction to Rot = 185.0. Second 
strand cDNA was prepared with the primer adapter of 
sequence [5 1 GAG AG AG AG AT T C T C G AGT T AAT T AAAT T AAT CCCCCCCCCCCCC 
3 1 ]- cDNA was cloned into the Xhol and BamHI sites. 
Vector: a modified pBluescript KS(+) after bulk excision 
from Lambda FLC I. Cloning sites, 5' end: Sail; 3 1 end: 
BamHI" 

ORIGIN 



Query Match 

Best Local Similarity 



36.7%; 
80.9%; 



Score 416.2; DB 13; Length 952; 
Pred. No. 1.9e-103; 



Matches 509; Conservative 0; Mismatches 118; Indels 2; Gaps 



2; 



Qy 1 AT GAGAGC CAATT GT T C CAGCAGCT C AGCCT GCC CT GC CAACAGT T CAGAGGAGGAGC T G 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 

Db 173 ATGAGCACAGACTGTGCGGGCAACTCCACCTGCCCTGTCAACAGTACGGAGGAAGACCCG 232 

Qy 61 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 12 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 

Db 233 CCCGTGGGAATGGAGGGCCATGCGAATCTAAAGCTGCTTTTTACAGTGCTCTCGGCTGTG 2 92 

Qy 121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 180 

III I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2 93 ATGGTGGGTTTGGTCATGTTCTCTTTTGGATGTTCTGTGGAGAGTCAGAAGCTCTGGTTG 352 

Qy 181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 240 

II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I II I I II I I I I I I 

Db 353 CACCTCAGAAGACCCTGGGGCATCGCAGTGGGCCTGCTTTCCCAGTTTGGACTTATGCCT 412 

Qy 241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTG7VAGCCAGTCCAAGCTATTGCTGTT 300 

I I I I I I I I I I I I I II I I I I I I I I IMIII III I I I I I I I I I I I I I I II 

Db 413 CTGACAGCTTATCTGTTAGCCATTGGCTTCGGTCTGAAACCATTCCAAGCTATTGCTGTC 472 

Qy 301 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 

Db 47 3 CTCATGATGGGGAGCTGCCCTGGGGGCACCATCTCTAATGTTCTCACCTTCTGGGTTGAT 532 

Qy 361 GGAGAT AT G GAT CT C AG CAT C AGT AT GACAAC CT GT T C CAC CGTGGCCGCC CT G GGAAT G 42 0 

ii i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 533 GGAGAT AT GGATCT CAGCAT CAGTAT GACAAC CT GTT CCACAGT GGCCGCCCT GGGAAT G 592 

Qy 421 AT G C CAC T CT GC AT T TAT CT CT AC AC CT GGT C CT GGAGT CT T C AG C AGAAT CT CAC CAT T 4 80 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 

Db 593 ATGCCTCTCTGCCTCTACATCTACACCCGGTCCTGGACTCTGACACAGAAACTCGTCATT 652 

Qy 481 CCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTC 54 0 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III III 

Db 653 CCGTATCAGAGCATAGGAATTACCCTTGTGTCCCTGGTGGTTCCTGTGGCTTCTGGCGTC 712 

Qy 541 TAT GT GAAT T AC AG AT GGC C AAAAC AAT C CAAAAT CAT T CT C AAGAT T G GGGC C GT T GT T 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 713 TATGTGAATTATAAGTGGCC-AAACAAGCCACAGTCATTCTCTAAGTCGGAGACATTCTG 771 

Qy 601 GGTGGGGTCCTCCTTCTGGTGGTCGCAGT 629 

II I I I I I I I I I I I I I I I I I I I I 

Db 772 GGTGGCAT-TTCCTGCTGGTGGTGGCGGT 799 



RESULT 4 
BG872314 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



BG872314 972 
602790977F1 NCI_CGAP_SG2 Mus 
mRNA sequence. 
BG872314 

BG872314.1 GI:14222854 
EST. 

Mus musculus (house mouse) 
Mus musculus 



bp mRNA linear EST 29-MAY-2001 
musculus cDNA clone IMAGE : 4 922227 5', 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus . 
1 (bases 1 to 972) 
NIH-MGC http://mgc.nci.nih.gov/. 

National Institutes of Health, Mammalian Gene Collection (MGC) 

Unpublished (1999) 

Contact: Robert Strausberg, Ph.D. 

Email: cgapbs-r@mail.nih.gov 

Tissue Procurement: Jeffrey E. Green, M.D. 

cDNA Library Preparation: Life Technologies, Inc. 

cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 

DNA Sequencing by: Incyte Genomics, Inc. 

Clone distribution: MGC clone distribution information can be 
found through the I.M.A.G.E. Consortium/ LLNL at: 
http : //image . llnl . gov 
Plate: LLAM10841 row: j column: 20 
High quality sequence stop: 786. 

Location/ Qualifiers 

1. .972 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="FVB/N" 

/ db_xr e f = " t axon : 1 0 0 9 0 " 

/ clone= M IMAGE : 4922227" 

/lab_host="DH10B (Tl phage-resistant ) " 
/clone_lib="NCI_CGAP_SG2" 

/note="Organ: salivary gland; Vector: pCMV-SP0RT6; Site_l : 
NotI; Site_2: Sail; Cloned unidirectionally . Primer: Oligo 
dT. Average insert size 1.3 kb. Constructed by Life 
Technologies. Note: this is a NCI_CGAP Library." 



ORIGIN 



Query Match 29.6%; Score 335.4; DB 12; Length 972; 

Best Local Similarity 73.8%; Pred. No. 4e-81; 

Matches 458; Conservative 0; Mismatches 151; Indels 12; Gaps 



2; 



Qy 



Db 



514 CTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGCCAAAACAATCCAAA 573 
III I I I I I I I II I I III I I I I I I I I I I I I I I II I I I I I I I I III I I 

1 CTGGTGGTTCCTGTGGCTTCTGGCGTCTATGTGAATTATAGGTGGCCAAAGCAAGCAACG 60 



Qy 



Db 



574 ATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCT 633 
I I I I I I I I I I I I II III II I I I I I I I I I I I I I I I I I I I I I I I I I I II 
61 GTCATTCTCAAGGTCGGAGCCATTCTGGGTGGCATGCTCCTCCTGGTGGTGGCAGTTACT 120 



Qy 

Db 

Qy 

Db 

Qy 

Db 



634 GGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTCTGACCATCAGTTTC 693 

II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

121 GGCATGGTCCTGGCAAAAGG CTGGAACACAGACGTCACTCTTCTGGTCATCAGCTGC 177 



694 



753 



ATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCT 
II I I II III I I I I I I I I I I I I II II I I I I I I I I I I I MINIM III 

178 ATTTTCCCCTTGGTCGGCCATGTCACAGGCTTCCTGCTGGCATTCCTCACCCACCAATCT 237 

754 T GGCAAAGGT GCAG GACAAT T T C CTT AGAAACT GGAGCT CAGAATAT T CAGAT GT GC AT C 813 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II III I I I I I I I I 

238 T GGCAAAGGT GCAG GAC CAT T T C CAT AGAG ACT G GC GCT CAGAACAT C CAGCT GT GC AT C 297 



Qy 



814 ACCATGCTCCAGTTATCTTTCACTGCTGAGCACTTGGTCCAGATGTTGAGTTTCCCACTG 873 



Db 



I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I 

298 GCCATGCTGCAGCTGTCCTTCTCTGCTGAGTACCTGGTCCAGCTGCTAAACTTTGCATTG 357 



Qy 874 GC CT AT GGACT CT T C CAGCT GATAGAT G GAT T T CT T AT T GT T G CAG CAT AT C AGAC GTAC 933 

I I I I I I I I I I I I I I I I I III III I II I I I I I I II I I I I I I I II I I I I 
Db 358 GC CT AT GGACT CTT C CAAGT GCT G C ACG GGCT GCT CAT T GT C G CAG CAT AT CAG G CAT AC 417 

Qy 934 AAGAGGAGATT GAAGAACAAACAT GGAAAAAAGAACT CAGGT T GCACAGAAGT CT GCCAT 993 

I I I I I I I I I I I I I III I I I I I II I I I I II I I I I I II I I I I 

Db 418 AAGAGGAGGCAGAAGAGTAAAT GCAGGAGACAGCACCCGGATTGCC CAGACGT CT GCTAC 477 

Qy 994 AC GAG GAAAT C GACT T CTT CCAGAGAGAC CAAT G C CTT CTT GGAGGT GAAT GAAGAAGGT 1053 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 478 GAGAAGCA GCCCAGAGAGACCAGTGCTTTCTTGGATAAAGGGGATGAGGCT 528 

Qy 1054 GCCATCACTCCTGGGCCACCAGGGCCAATGGATTGCCACAGGGCTCTCGAGCCAGTTGGC 1113 

I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I III 

Db 52 9 GCCGTAACTCTGGGGCCAGTGCAGCCAGAGCAGCACCACAGGGCTGCTGAGCTGACTAGC 58 8 

Qy 1114 C ACAT C ACT T CAT GT GAAT AG 1134 

I I I I I I I I I I I I I I I I I I I 

Db 589 C AC AT T C CT T CAT GT GAAT AG 609 



RESULT 5 
BE181226 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
COMMENT 



BE181226 356 bp mRNA linear EST 22-JUN-2000 

CM2-HT0630-220300-125-f05 HT0630 Homo sapiens cDNA, mRNA sequence. 
BE181226 

BE181226.1 GI:8660402 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 356) 

Dias Neto,E., Garcia Correa,R., Ver j ovski-Almeida, S . , Briones , M. R. , 
Nagai,M.A., da Silva,W. Jr., Zago,M.A., Bordin,S., Costa, F.F., 
Goldman, G. H. , Carvalho, A. F. , Matsukuma,A. , Baia,G.S., Simpson, D . H . , 
Brunstein, A. , deOliveira, P . S . , Bucher,P., Jongeneel, C. V. , 
0 , Hare,M.J., Soares,F., Brentani , R. R. , Reis , L . F . , de Souza,S.J. and 
Simpson, A. J. 

Shotgun sequencing of the human transcriptome with ORF expressed 
sequence tags 

Proc. Natl. Acad. Sci. U.S.A. 97 (7), 3491-3496 (2000) 

20202663 

10737800 

Contact: Simpson A. J.G. 
Laboratory of Cancer Genetics 
Ludwig Institute for Cancer Research 

Rua Prof. Antonio Prudente 109, 4 andar, 01509-010, Sao Paulo-SP, 
Brazil 

Tel: +55-11-2704922 
Fax: +55-11-2707001 
Email: asimpson@ludwig.org.br 

This sequence was derived from the FAPESP/LICR Human Cancer Genome 
Project. This entry can be seen in the following URL 



FEATURES 

source 



(http://www.ludwig.org.br/scripts/gethtml2.pl?tl-&t2=CM2-HT0630-220 

300-125-f05&t3=2000-03-22&t4=l) 

Seq primer: puc 18 forward 

High quality sequence start: 12 

High quality sequence stop: 354. 

Location/Qualifiers 

1. .356 

/organism="Homo sapiens" 
/mol_type="mRNA" 
/db__xref="taxon: 9606" 
/dev_stage="Adult" 
/clone_lib="HT0630" 

/note="Organ: head_neck; Vector: pucl8; Site_l : Smal; 
Site_2: Smal; A mini-library was made by cloning products 
derived from ORESTES PCR (U.S. Letters Patent application 
No. 196,716 - Ludwig Institute for Cancer Research) 
profiles into the pUC 18 vector. Reverse transcription of 
tissue mRNA and cDNA amplification were performed under 
low stringency conditions." 



ORIGIN 



Query Match 28.9%; 
Best Local Similarity 97.7%; 
Matches 343; Conservative 



Score 327.4; DB 10; 
Pred. No. 4.3e-79; 
0; Mismatches 6; 



Length 356; 
Indels 2; 



Gaps 



1; 



Qy 


411 


Db 


6 


Qy 


471 


Db 


66 


Qy 


529 


Db 


126 


Qy 


589 


Db 


186 


Qy 


649 


Db 


246 


Qy 


709 


Db 


306 



C CT GGGAAT GAT GCCACT CT G CAT T TAT CT CTACACCT G GT C CT GGAGT CT T C AGCAGAA 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

CCTGGGAATGATGCCACTCTGCATTTATCTCTACACCTGGTCCTGGAGTCTTCAGCAGAA 



470 



65 



528 



T C T C AC CAT T C CT T AT C AGAAC AT AGGAA — TTACCCTTGTGTGCCTGACCATTCCTGTG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I 
T CT CACC ATT C CTT AT C AGAACAT AGGAAGT T AC C CT AT GT GT GC CT GAC CAT T C CT GT G 125 

GCCTTTGGTGTC TAT GT G AAT T AC AG AT G G C C AAAAC AAT C C AAAAT CAT T C T C AAG AT T 588 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

GC CT T AG GT GT CT AT GT GAAT T AC AGAT GGC C AAAAC AAT C C AAAAT CAT T CT CAAGAT T 185 

GGGGCCGTTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCG 648 

I | | | | | | | | | | I | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 

GGGGCCGTTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCG 245 

AAAGGAT CTTGGAATT CAGACAT CACCCTT CTGACCAT CAGTTTCAT CTTT CCTTTGATT 708 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

AAAGGAT CTT G GAAT T CAGACAT CACCCTT CTGACCAT CAGTTT CAT CTTT CCTTTGATT 305 

GGC CAT GT C AC GGGTTTTCTGCT GGC ACT T T TT AC C C AC C AGT CT T G GC AA 7 59 
I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
GGCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCCA 356 



RESULT 6 
AY413909 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 



AY413909 1047 bp DNA linear GSS 17-DEC-2003 

Mus musculus SLC10A2 gene, VIRTUAL TRANSCRIPT, partial sequence, 
genomic survey sequence. 
AY413909 

AY413909.1 GI:39769871 



KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
PUBMED 
REFERENCE 
AUTHORS 



TITLE 
JOURNAL 

COMMENT 

FEATURES 

source 



gene 



GSS. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 1047) 

Clark, A. G. , Glanowski, S . , Nielson, R., Thomas, P., Kejariwal, A. , 
Todd, M. A., Tanenbaum,D.M. , Civello, D . R. , Lu,F., Murphy, B., 
Ferriera,S., Wang,G., Zheng, X.H., White, T. J., Sninsky, J. J. , 
Adams, M.D. and Cargill,M. 

Inferring nonneutral evolution from human- chimp-mouse orthologous 
gene trios 

Science 302 (5652), 1960-1963 (2003) 
14671302 

2 (bases 1 
Clark, A. G. , 



to 1047) 
Glanowski, S , 



Nielson, R. , Thomas, P., Kejariwal, A. , 
Todd, M. A. , Tanenbaum, D.M. , Civello, D . R. , Lu,F., Murphy, B., 
Ferriera,S., Wang,G., Zheng, X.H., White, T. J., Sninsky, J. J. , 
Adams, M.D. and Cargill,M. 
Direct Submission 

Submitted ( 16-NOV-2003 ) Celera Genomics, 45 West Gude Drive, 
Rockville, MD 20850, USA 

This seguence was made by sequencing genomic exons and ordering 
them based on alignment. 

Location/Qualifiers 

1. .1047 

/organism="Mus musculus" 
/mol_type="genomic DNA" 
/db_xref="taxon: 10090" 
<1. .>1047 
/gene="SLC10A2 " 
/locus_tag="HCM5047 " 



ORIGIN 



Query Match 27.1%; 
Best Local Similarity 59.4%; 
Matches 522; Conservative 



Score 307.8; DB 29; 
Pred. No. 1.7e-73; 
0; Mismatches 357; 



Length 1047; 



Indels 



0; Gaps 



0; 



Qy 

Db 



80 ATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGT 139 

I I I I I II I II II III III I I II II I I I I I 

8 0 AT G CAAT T CT C AAT AC AGT GAT GAG CACT GT GCT C AC CAT C CT CT T AGC CAT GGT GAT GT 139 



Qy 

Db 

Qy 

Db 

Qy 

Db 



14 0 TCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGG 199 

I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I II 

140 T T T CT AT G G GGT GCAAT GT GGAAGT C CACAAGTT C CT AG GACAT AT AAAGAGAC C AT GGG 199 

2 00 GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG 259 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

200 GTATCTTCGTGGGCTTCCTCTGTCAGTTTGGAATCATGCCTCTCACAGGCTTTATCCTGT 259 

260 CCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 319 

I I III I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 

2 60 CTGTGGCCTCTGGCATCCTTCCTGTACAGGCTGTAGTGGTGCTAATTATGGGTTGCTGCC 319 



Qy 

Db 



32 0 C GG G GGGCAC CAT C T CT AACAT TT T CAC CTT CT GGGTT GAT GGAGATAT GGAT C T C AG CA 379 

I I I I I I I I I I I I I I I III I I I I II I I I I I I I I II I I I I I 
320 CT G GAGGAACT GGC T C CAAT AT CCT GGC CT AT TGGATAGAT G GC GACAT GGAC CT C AGT G 379 



Qy 380 TCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGTUVTGATGCCACTCTGCATTTATC 43 9 

I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I 

Db 380 TTAGCATGACCACTTGCTCCACACTGCTTGCCCTTGGAATGATGCCTCTTTGCCTCTTCG 43 9 

Qy 440 T C T AC AC CTGGTCCTG G AGT C T T C AG C AG AAT C T C AC CAT T C C T TAT C AG AAC AT AG G AA 499 

I I I I I I I I I Ml I I I I I I I I I I I I I I I I I' I I I 

Db 440 TCTACACCAAGATGTGGGTTGACTCGGGAACGATTGTGATTCCCTATGATAGCATTGGTA 499 

Qy 500 TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 559 

I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 500 TTTCTCTGGTTGCTCTTGTTATTCCTGTTTCCTTTGGAATGTTTGTAAATCACAAATGGC 559 

Qy 560 CAAAACAATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 619 

I I I I I I I I I II I I I I I I I I I I I III MM I I I i II I I 

Db 560 C AC AAAAAG C GAAGAT T AT ACT T AAAAT T G GAT C CAT C AC AGGT GT AAT T C T CAT T GT GC 619 

Qy 620 TGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTC 67 9 

I I I I I I I I I I I I I I III I M M II II II 

Db 620 T CAT AG CT GT GAT T GGAGGAAT ACT GT AC C AAAGTG C CT G GAT CAT T GAAC CCAAACT GT 67 9 

Qy 680 TGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 73 9 

II II I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 680 GGATTATAGGAACAATATTCCCTATAGCTGGCTACAGCCTGGGTTTCTTCCTGGCTAGAC 739 

Qy 7 40 TTACC C ACCAGT CT T GGCAAAGGT GCAGGACAATTT CCTTAGAAACT GGAGCT CAGAATA 79 9 

I I II I I II I I I I II I I I I I I I I I I I II I I I I I I I I I I I I 

Db 74 0 TAGCTGGTCAACCCTGGTACAGGTGCCGAACAGTAGCCTTGGAAACTGGAATGCAGAACA 799 

Qy 800 TT CAGATGT GCAT CACCAT GCT CCAGT TAT CTTT CACT GCT GAGCACTT GGT CCAGATGT 85 9 

I I II I I I I I I I I II I I II I I I I I I I I I I I I I I I I I II I 
Db 800 CTCAGCTGTGCTCCACCATTGTACAGCTCTCCTTCTCCCCCGAGGATCTCAACCTGGTGT 859 

Qy 860 TGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGATGGATTTCTTATTGTTGCAG 919 

II I I II I I I I I I I I I I I I I I I I I I III III III 

Db 8 60 T CAC CT T C C CACT CAT C TAT ACT GT T T T C C AGCT C GT CT T T GC AG C AGT AAT AT T AGGAA 919 

Qy 920 CAT AT C AGAC GT AC AAGAG GAGAT T GAAGAACAAAC AT G 958 

III I I I I I I I I II I I I I III 

Db 920 TTT AT GT CACAT ACAGGAAAT GTTAT GGAAAAAAT GAT G 958 



RESULT 7 
AY413907 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AY413907 1047 bp DNA linear GSS 17-DEC-2003 

Homo sapiens SLC10A2 gene, VIRTUAL TRANSCRIPT, partial sequence, 
genomic survey sequence. 
AY413907 

AY413907.1 GI:39769869 
GSS. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 1047) 

Clark, A. G., Glanowski, S . , Nielson,R., Thomas, P., Ke j ariwal , A. , 
Todd, M. A., Tanenbaum, D.M. , Civello, D . R. , Lu,F., Murphy, B., 



Ferriera,S., Wang,G., Zheng, X.H., White, T. J., Sninsky, J . J. , 
Adams, M.D. and Cargill,M. 
TITLE Inferring nonneutral evolution from human- chimp-mouse orthologous 

gene trios 

JOURNAL Science 302 (5652), 1960-1963 (2003) 
PUBMED 14671302 
REFERENCE 2 (bases 1 to 1047) 

AUTHORS Clark,A.G., Glanowski , S . , Nielson,R., Thomas, P., Kejariwal,A. , 
Todd, M. A., Tanenbaum, D.M., Civello, D . R. , Lu,F., Murphy, B., 
Ferriera,S., Wang,G., Zheng, X.H., White, T. J., Sninsky, J. J. , 
Adams, M.D. and Cargill,M. 
TITLE Direct Submission 

JOURNAL Submitted ( 16-NOV-2003) Celera Genomics, 45 West Gude Drive, 
Rockville, MD 20850, USA 
COMMENT This sequence was made by sequencing genomic exons and ordering 

them based on alignment. 
FEATURES Location/Qualifiers 
source 1. .1047 

/organism="Homo sapiens" 
/ mo l_type=" genomic DNA" 
/db_xref="taxon: 9606" 
gene <1. .>1047 

/gene="SLCl0A2 " 
/locus_tag="HCM5047" 

ORIGIN 

Query Match 26.3%; Score 297.8; DB 29; Length 1047; 

Best Local Similarity 58.5%; Pred. No. 9.9e-71; 

Matches 518; Conservative 0; Mismatches 367; Indels 0; Gaps 0; 

Qy 80 AT GGAAACCT GGAGCT CGTTTT CACAGT GGT GT CCACT GT GAT GAT GGGGCT GCT CAT GT 139 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 8 0 ATAACAT CCTAAGTGT GGT CCTAAGTACGGT GCT GACCAT CCT GTT GGCCT T GGT GAT GT 139 

Qy 14 0 TCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGG 199 

I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I 

Db 140 T CT CCAT GG GAT GCAAC GT GGAAAT CAAGAAAT T T CT AGG GCAC AT AAAG C GGCCGT GG G 199 

Qy 200 GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG 259 

I I I I I Mill I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 200 GCATTTGTGTTGGCTTCCTCTGTCAGTTTGGAATCATGCCCCTCACAGGATTCATCCTGT 259 

Qy 260 CCATTAGCTTTTCTCTGAAGCCAGTCC7VAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 319 

I I I I I I I II I I II I I II I I I I I I I I I I I I I I I I I I 

Db 260 CGGTGGCCTTTGACATCCTCCCGCTCCAGGCCGTAGTGGTGCTCATTATAGGATGCTGCC 319 

Qy 320 CGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGATGGAGATATGGATCTCAGCA 379 

I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I 
Db 320 CTGGAGGAACTGCCTCCAATATCTTGGCCTATTGGGTCGATGGCGACATGGACCTGAGCG 37 9 

Qy 380 TCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATC 439 

I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I 
Db 380 TCAGCATGACCACATGCTCCACACTGCTTGCCCTCGGAATGATGCCGCTGTGCCTCCTTA 439 

Qy 440 T CTACACCT GGT CCT GGAGTCTTCAGCAGAAT CT CAC CATT CCTT AT CAGAACATAGGAA 499 

I I I I I I I III II II I I I II I I I I I I I I I I I I I 

Db 440 T CT AT AC CAAAAT GT G GGT C GACT CT GGGAGC AT C GT AAT T C C CT AT GAT AAC AT AGGT A 499 



Qy 500 TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 559 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 500 CATCTCTGGTTGCTCTCGTTGTTCCTGTTTCCATTGGAATGTTTGTTAATCACAAATGGC 559 

Qy 560 CAAAACAATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 619 

I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I II I I 

Db 560 CCCAAAAAGCAAAGATCATACTTAAAATTGGGTCCATCGCGGGCGCCATCCTCATTGTGC 619 

Qy 620 TGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTC 679 

I I I I I I I I I I I I II III I I I I I I II M 

Db 620 TCATAGCTGTGGTTGGAGGAATATTGTACCAAAGCGCCTGGATCATTGCTCCCA7VACTGT 67 9 

Qy 68 0 TGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 739 

II II I I I I I II I I I I II I I I I I I I I I I I I I I I 

Db 68 0 GGATTATAGGAACAATATTTCCTGTGGCGGGTTACTCCCTGGGGTTTCTTCTGGCTAGAA 73 9 

Qy 74 0 TT AC C C AC C AGT CT T G G CAAAGGT GC AG GACAAT T T C CT T AG AAACT GGAG CT C AGAAT A 799 

III I I I I I I I I I I I I I II I I I I I I I I I I I I Mill! 

Db 74 0 TTGCTGGTCTACCCTGGTACAGGTGCCGAACGGTTGCTTTTGAAACGGGGATGCAGAACA 799 

Qy 800 T T C AGAT GT GCAT C AC CAT GCT C C AGT TAT CT T T C ACT G CT GAG C ACT T GGT C CAGAT GT 859 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 800 C GCAGC T AT GTT CC AC CAT CGT T CAGCT CT CCT T C ACT C CT GAGGAGCT CAAT GT C GT AT 859 

Qy 860 T GAGT T T C CC ACT GGC CT AT GGACT CTT C C AG CT GAT AGAT G GAT T T CT TAT T GT T GC AG 919 

II I I I I I I I III I I I I I I I I I I II I I III 

Db 860 TCACCTTCCCGCTCATCTACAGCATTTTCCAGCTCGCCTTTGCCGCAATATTCTTAGGAT 919 

Qy 92 0 CAT AT CAGACGT ACAAGAGGAGATT GAAGAACAAACAT GGAAAAA 964 

III I I I I I I I I I I I I I 1 I I I I I I 

Db 920 T T TAT GT GGCAT ACAAGAAAT GT CAT GGAAAAAACAAGG CAGAAA 964 



RESULT 8 
CF998755 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



CF998755 



773 bp 



mRNA 



linear EST 25-NOV-2003 



AGENCOURT_1638 857 0 NIH_ZGC_7 Danio rerio cDNA clone IMAGE : 7040629 

5', mRNA sequence. 

CF998755 

CF998755.1 GI:38519606 
EST. 

Danio rerio (zebrafish) 
Danio rerio 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 

Actinopterygii ; Neopterygii ; Teleostei ; Os tariophysi ; 

Cyprinif ormes ; Cyprinidae; Danio. 

1 (bases 1 to 773) 

NIH-MGC http : / /mgc . nci . nih . gov/ . 

National Institutes of Health, Mammalian Gene Collection (MGC) 
Unpublished (1999) 

Contact: Daniela S. Gerhard, Ph.D. 
Office of Cancer Genomics 
National Cancer Institute / NIH 
Bldg. 31 RmlOA07 Bethesda, MD 20892 
Email: cgapbs-r@mail.nih.gov 
Tissue Procurement: Len Zon, Harvard 



FEATURES 

source 



cDNA Library Preparation: Open Biosystems 

cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 

DNA Sequencing by: Agencourt Bioscience Corporation 

Clone distribution: MGC clone distribution information can be 

found through the I.M.A.G.E. Consortium/LLNL at: 

http : / /image . llnl . gov 

Plate: LLAM14795 row: m column: 11 
High quality sequence stop: 663. 

Location/Qualifiers 

1. .773 

/organism="Danio rerio" 
/mol_type="mRNA" 
/db_xref="taxon:7 955" 
/clones " IMAGE :7040629" 
/tissue_type="whole body" 
/lab_host="DH10B" 
/clone_lib="NIH__ZGCj7" 

/note="Vector: pExpressl; Site_l: NotI; Site_2 : EcoRV; 
Bulk tissue was collected from a whole adult individual 
from the Tuebingen strain. 1st strand cDNA was primed with 
a Not I - oligo(dT) primer, double-stranded cDNA was 
cloned into the Not I and EcoRV sites of pExpress-1. 
Library was size-selected for >1 kb fragments and 
normalized. A non-normalized version of this library is 
also available (NIH_ZGC_10) . Library was constructed by 
Open Biosystems (Huntsville, AL) " 



ORIGIN 



Query Match 24.9%; Score 282.8; DB 14; Length 773; 

Best Local Similarity 61.7%; Pred. No. 1.2e-66; 

Matches 4 66; Conservative 0; Mismatches 28 8; Indels 1; 



Gaps 



l; 



Qy 



Db 



115 ACTGTGATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTG 174 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

10 ACCGTCATGTTGGCCATGGTTATGTTTTCAATGGGCTGCACTGTTGAGGCTAGAAAACTG 69 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



175 



70 



235 



130 



295 



190 



355 



250 



415 



310 



TGGTCGCACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTC 234 

III I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

TGGGGGCACGTTCGCAGACCCTGGGGCATTTTTATAGGTTTCCTTTGCCAGTTTGGCATC 12 9 



ATGCCTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATT 

M I I I I I I I I I I I I III I I II II I I I I I I I I I I I I 

ATGCCTTTCACAGCCTTCATACTTTCATTGCTTTTCAACGTGCTGCCAGTCCAGGCGGTG 



294 



189 



354 



GCTGTTCTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGG 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 

GTCATCATCATCATGGGCTGCTGCCCTGGAGGATCAAGCTCTAATGTTTTCTGCTACTGG 24 9 

GT T GAT G GAGAT AT GGAT CT C AGC AT C AGT AT GACAACCT GT T C C AC CGTGGCCGCCCTG 414 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I Mill 
CTT GAT GGAGACATGGACCTAAGCAT CAGCATGACAGCGT GTT CTT CAATTTT GGCT CTG 309 

GGAAT GAT GCCACTCT GCATTT AT CT CTACACCT GGT CCT GGAGT CTT CAGCAGAAT CT C 47 4 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

GGAAT GAT GC CT CTTTGTCTGCT CAT T T AC AC C ACAAT CT GGACT G CAGGC GAT GC GATT 369 



Qy 



475 AC CATT C CTTAT C AGAACAT AGGAAT T AC CCTTGTGTGCCT GAC C ATT CC T GTGGCCTTT 534 



1 1 1 1 1 1 1 1 I 1 1 1 1 II 1 1 1 1 1 1 1 1 1 I II I I I I I I I II 

Db 370 CAGATTCCTTACGACAATATTGGGATCACACTGGTGAGTTTGCTTGTGCCTGTCGGTCTT 429 

Qy 535 GGT GT C TAT GT GAAT T AC AGAT G GC CAAAAC AAT C CAAAAT CATT CT C AAGAT T G GG G C C 594 

II II I I I I I III I I I I I I I I I I I II I I I I I I I I I I I I I I 

Db 430 GGGATGTTAGTGAAACACAAGTGGCCTAAAGCTGCCAAAAAGATCCTCAAGGTTGGATCT 489 

Qy 595 GTTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGA 654 

I I I I I I I 11 I I I I III I I I I I I I I I I I I I I I I 

Db 4 90 GTGGTGGGAATTGTCCTCATCATCGTCATTGCAGTAATTGGTGGTGTGCTTTATCAGTCC 549 

Qy 655 T C T T G GAAT T C AGAC AT C AC CCT T CT GAC CAT C AGTT T CAT CTTTCCTTT GAT T GGC C AT 714 

I I I II I I I I I I I II I I I II I I I I I I I I I I I I I I I 

Db 550 TCATGGACCATTGCTCCCTCACTTTGGATCATTGGTACCATTTATCCATTTATTGGATTT 609 

Qy 715 GTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGACAATT 774 

II I I I I I I I I I I I I III III I I I I I I I I I I I II I II I I I 

Db 610 GGCTTAGGGTTCCTCTTGGCACGCTTTGTGGGCCAACCTTGGCACAGGTGCCGCACCATT 669 

Qy 775 T C CT T AGAAACT GGAG CT CAGAAT ATT C AGAT GT GC AT C AC CAT GCT C C AGT TAT CT T T C 834 

I I I I I I I I I I I 1 II I I I I I I I I I I I I I II I I I I I 

Db 670 GCTCTAGAAAC-GGCATGCAGAACGCCCAGCTGGGCAGTACTATTTACCCAGTGTCCTTT 728 

Qy 835 ACT GCT GAGC ACT T G GT C C AG AT GT T G AGT T T C C C 869 

I III III I I I I I I I I I 

Db 729 AGCCCTGCAGAGCTTGANGTCATGTTCGCGTTTCC 763 



RESULT 9 
BY779230 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
COMMENT 



BY779230 364 bp mRNA linear EST 10-DEC-2003 

BY779230 RIKEN full-length enriched, 17.5 days embryo whole body 
Mus musculus cDNA clone L930133F11 5 1 , mRNA sequence. 
BY779230 

BY779230.1 GI : 397 05869 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 364) 

Carninci,P., Waki,K., Shiraki,T., Konno,H., 
Aizawa,K., Arakawa,T., Ishii,Y., Sasaki, D., 
Sugahara,Y., Saito,R., Osato,N., Fukuda,S., 

Hirozane-Kishikawa,T. , Nakamura,M. , Shibata, Y. , Yasunishi,A. , 
Kikuchi,N., Yoshiki,A. , Kusakabe,M., Gus tincich, S . , Beisel,K., 
Pavan,W., Aidinis,V., Nakagawara, A. , Held, W. A., Iwata,H., Kono,T. 
Nakauchi,H., Lyons, P., Wells, C, Hume, D. A., Fagiolini, M. , 
Hensch,T.K., Brinkmeier , M. , Camper, S., Hirota,J., Mombaerts , P . , 
Muramatsu,M. , Okazaki,Y., Kawai,J. and Hayashizaki, Y. 
Targeting a complex transcriptome : the construction of the mouse 
full-length cDNA encyclopedia 
Genome Res. 13 (6B), 1273-1289 (2003) 
22703353 
12819125 

Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 



Shibata, K. , Itoh,M. , 
Bono,H., Kondo,S., 
Sato, K. , Watahiki, A. 



Sciences Center (GSC) , Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome-res@gsc . riken. go. jp, 

URL : http : / / genome . gs c . riken . go . j p/ 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 

Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 

Please visit our web site (http://genome.gsc.riken.jp/) for 
further details. 
FEATURES Location/Qualif iers 

source 1. .364 

/organism="Mus musculus" 
/mol_type= M mRNA ,, 
/strain="C57BL/6J" 
/db_xref="taxon: 10090" 
/clone="L930133Fll" 
/tissue_type="whole body" 
/dev_stage=" 17 . 5 days embryo" 

/clone_lib="RIKEN full-length enriched, 17.5 days embryo 
whole body" 

ORIGIN 

Query Match 23.4%; Score 265.4; DB 13; Length 364; 

Best Local Similarity 83.2%; Pred. No. 5.9e-62; 

Matches 302; Conservative 0; Mismatches 61; Indels 0; Gaps 0; 

Qy 37 GCCAACAGTTCAGAGGAGGAGCTGCCAGT GGGACT GGAGGTGCATGGAAACCT GGAGCT C 96 

i I I I I I I ! I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2 GTCAACAGTACGGAGGAAGACCCGCCCGTGGGAATGGAGGGCCATGCGAATCTAAAGCTG 61 

Qy 97 GTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCC 156 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 62 CTTTTTACAGTGCTCTCGGCTGTGATGGTGGGTTTGGTCATGTTCTCTTTTGGATGTTCT 121 

Qy 157 GTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGGGCATTGCTGTGGGACTG 216 

I II I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I II I II 

Db 122 GT GGAGAGT CAGAAGCT CT GGTT GC AC CT CAGAAGACC C T GGGGCAT C GCAGT GG G CC T G 181 

Qy 217 CTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTG 276 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 182 CTTTCCCAGTTTGGACTTATGCCTCTGACAGCTTATCTGTTAGCCATTGGCTTCGGTCTG 241 

Qy 277 AAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCT 336 

I I I I I I I MINIM I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 242 AAACCATTCCAAGCTATTGCTGTCCTCATGATGGGGAGCTGCCCTGGGGGCACCATCTCT 301 

Qy 337 AAC AT TT T C AC CTT CT GGGTT GAT GGAGAT AT GGAT CT CAGCAT CAGT AT G ACAAC CT GT 396 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 

Db 302 AAT GT T C T CAC CTT CT GGGTT GAT GGAGAT AT GGAT C T CAG CAT CAGT AT GACAAC CT GT 361 



Qy 



397 TCC 399 
I I I 



Db 



362 TCC 364 



RESULT 10 

CB320835 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



CB320835 834 bp mRNA linear EST 04-MAR-2003 

AGENCOURT__122368 84 NIH_MGC_136 Mus musculus cDNA clone 
IMAGE: 30289461 5', mRNA sequence. 
CB320835 

CB320835. 1 GI:28845070 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 834) 
NIH-MGC http://mgc.nci.nih.gov/. 

National Institutes of Health, Mammalian Gene Collection (MGC) 
Unpublished (1999) 
Contact: Robert Strausberg, Ph.D. 
Email: cgapbs-r@mail.nih.gov 
Tissue Procurement: Dr. David Rowe 
cDNA Library Preparation: Invitrogen Corp 

cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 
DNA Sequencing by: Agencourt Bioscience Corporation 
Clone distribution: MGC clone distribution information can be 
found through the I.M.A.G.E. Consortium/LLNL at: 
http : //image . llnl . gov 
Plate: NDAM328 row: i column: 22 
High quality sequence stop: 560. 
Location/Qualifiers 
1. .834 

/organism="Mus musculus" 
/mol_type="mRNA" 
/db_xref="taxon: 10090" 
/clone=" IMAGE: 30289461" 

/tissue_type="embryonic limb, maxilla and mandible" 
/lab_host="DHl0B (phage-resistant ) " 
/ cl one_l ib= "NI H_MGC_1 36" 

/note="Vector : pCMV-SPORT6 . 1 ; Site_l : EcoRV; Site_2: NotI; 
Normalized, full-length enriched library from pool of 
mouse embronic limb, maxilla and mandible, embryonic day 
17.5, 18.5 and newborn (mandible (5, 4 and 1 limb and jaw 
equivalents from respective days) . Cloned directionally, 
oligo-dT primed ( 5 ' -GACTAGTTCTAGATCGCGAGCGGCCGCCC (T) 15-3 ' . 
Size selected for the >lkb fragments, average insert size 
1.2 kb. Normalization to Cot 7.5 . Tissue contributed by 
David Rowe; library constructed by ResGen, Invitrogen 
Corp. Note: this is a NIH_MGC Library." 



ORIGIN 



Query Match 22.6%; Score 256.4; DB 14; Length 834; 

Best Local Similarity 72.5%; Pred. No. 2.5e-59; 

Matches 364; Conservative 0; Mismatches 12 6; Indels 12; Gaps 



2; 



Qy 



633 TGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTCTGACCATCAGTTT 692 
II I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I 



Db 



1 TGGCATGGTCCTGGCAAAAGG CT GGAAC ACAGAC GT C ACT C T T CT GGT CAT CAGCTG 57 



Qy 693 CATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTC 752 

III II II III I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I II I M 

Db 58 CATTTTCCCCTTGGTCGGCCATGTCACAGGCTTCCTGCTGGCATTCCTCACCCACCAATC 117 

Qy 753 T T G G CAAAGGT GCAGGACAAT TT C CT TAGAAACT GGAGCT CAGAAT ATT C AGAT GT GCAT 812 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II II III I I I I I I I 
Db 118 T T GG CAAAGGT GCAGGACC AT TT C CAT AGAGACT GGC GCT CAGAAC AT C C AGCT GT G CAT 177 

Qy 813 C AC CAT G CT C C AGT T AT CT T T C ACT GCT GAGC ACT T G GT C C AGAT GT T GAGT T T C C C ACT 872 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II III 
Db 178 CGCCATGCTGCAGCTGTCCTTCTCTGCCGAGTACCTGGTCCAGCTGCTAAACTTTGCATT 237 

Qy 873 GGC CT AT GGACT CTT C C AG CT GAT AGAT GGAT T T CT TAT T GTT GCAG CAT AT C AGAC GT A 932 

I I I I I I I I I I I I I I I I I III III I I I I I I II I I I I I I I I I I I I I I I 
Db 238 GGCCTACGGACTCTTCCAAGTGCTGCACGGGCTGCTCATTGTCGCAGCATATCAGGCATA 297 

Qy 933 CAAGAG GAGAT T GAAGAACAAAC AT GGAAAAAAGAACT CAGGTT GCACAGAAGT CT GC CA 992 

I I I I I I I! I I II I I III I I I I I I I I I I I I I I I I I I I I I I I I 

Db 298 CAAGAGGAGG CAGAAGAGT AAAT GCAGGAGACAG CAC CC G GAT T GC C CAGACGT CT GCT A 357 

Qy 993 T AC GAG G AAAT C G AC T T C T T C C AG AG AG AC C AAT GCCTTCTTG GAG G T GAAT G AAG AAG G 1052 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 358 CGAGAAGCA GC CCAGAGAGACCAGT GCTTT CTTGGATAAAGGGGATGAGGC 408 

Qy 1053 TGCCATCACTCCTGGGCCACCAGGGCCAATGGATTGCCACAGGGCTCTCGAGCCAGTTGG 1112 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 409 TGCCGTAACTCTGGGGCCAGTGCAGCCAGAGCAGCACCACAGGGCTGCTGAGCTGACTAG 468 

Qy 1113 C CACAT CACT T CAT GT GAAT AG 1134 

I I I I I I I I I I I I I I I I I I I I 
Db 469 C CACAT T C C T T CAT G T GAAT AG 490 



RESULT 11 

AY413908 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
PUBMED 
REFERENCE 



AY413908 912 bp DNA linear GSS 17-DEC-2003 

Pan troglodytes SLC10A2 gene, VIRTUAL TRANSCRIPT, partial sequence, 
genomic survey sequence. 
AY413908 

AY413908. 1 GI: 397 69870 
GSS. 

Pan troglodytes (chimpanzee) 
Pan troglodytes 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Pan. 

1 (bases 1 to 912) 

Clark, A, G. , Glanowski , S . , Nielson,R., Thomas, P., Ke j ariwal , A. , 
Todd, M. A., Tanenbaum, D.M. , Civello, D . R. , Lu,F., Murphy, B. , 
Ferriera,S., Wang,G., Zheng, X.H., White, T . J. , Sninsky, J. J . , 
Adams, M.D. and Cargill,M. 

Inferring nonneutral evolution from human-chimp-mouse orthologous 
gene trios 

Science 302 (5652), 1960-1963 (2003) 
14671302 

2 (bases 1 to 912) 



AUTHORS Clark, A. G., Glanowski, S . , Nielson,R., Thomas , P . , Ke j ariwal, A. , 
Todd, M. A., Tanenbaum, D.M. , Civello, D . R. , Lu,F., Murphy, B . , 
Ferriera,S., Wang,G., Zheng, X.H., White, T . J. , Sninsky, J . J . , 
Adams, M.D. and Cargill,M. 

TITLE Direct Submission 

JOURNAL Submitted ( 16-NOV-2003 ) Celera Genomics, 45 West Gude Drive, 
Rockville, MD 20850, USA 
COMMENT This sequence was made by sequencing genomic exons and ordering 

them based on alignment. 
FEATURES Location/Qualifiers 
source 1. .912 

/organism="Pan troglodytes" 
/mol_type=" genomic DNA n 
/db_xr e f = " t axon : 9 5 9 8 " 
gene <1. .>912 

/gene="SLC10A2 " 
/locus_tag="HCM5047" 

ORIGIN 

Query Match 21.6%; Score 24 5; DB 29; Length 912; 

Best Local Similarity 47.4%; Pred. No. 3.6e-56; 

Matches 386; Conservative 0; Mismatches 428; Indels 0; Gaps 0; 

Qy 80 AT GGAAAC CT GGAGCT C GT T T T C AC AGT GGT GT C C ACT GT GAT GAT GGGGCTGCT CAT GT 139 

II I I I I III II I I I I II I I I I I I I I I I I I I 

Db 80 ATAACATCCTAAGTGTGGTCCTAAGTACGGTGCTGACCATCCTGTTGGCCTTGGTGATGT 139 

Qy 140 TCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGG 199 

MM I I II I I I II I II I I II Ml I I I II I I I I I II M I I 

Db 140 T C T C CAT G G GAT G C AAC GT G G AAAT C AAG AAAT T T C T AG G G C AC AT AAAG CGGCCGTGGG 199 

Qy 2 00 GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG 259 

Mill I I I I I I I I I I I I II M I II I M II I I I I II I I I II M 

Db 2 00 GCATTTGTGTTGGCTTCCTCTGTCAGTTTGGAATCATGCCCCTCACAGGATTCATCCTGT 259 

Qy 2 60 CCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 319 

I I I I I I I II MUM II II I II I I I I I I II I II II 

Db 2 60 CGGTGGCCTTTGACATCCTCCCGCTCCAGGCCGTGGTGGTGCTCATTATAGGATGCTGCC 319 

Qy 320 CGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGATGGAGATATGGATCTCAGCA 379 

I I I I I II I I I I I II I I I I I I II I I I 

Db 320 CTGGAGGAACTGCCTCCAATATCTTGGCCTATTNGGTCGNNNNNNNNNNNNNNNNNNNCG 379 

Qy 380 TCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATC 439 

II I I II I I I I I I I I II I I II I I II I I I I I I II II I I I I I I I I I 

Db 380 TCAGCATGACCACATGCTCCACACTGCTTGCCCTCGGAATGATGCCGCTGTGCCTCCTTA 439 

Qy 440 T CTACACCT GGTCCT GGAGT CTT CAGCAGAAT CT CACCATT C CTT AT CAGAACAT AGGAA 499 

I I I I II I I II II II II I I I II I I I I II II I 

Db 4 40 TCTATACCAAT^ATGTGGGTCGACTCTGGGAGCATCGTAATTCCCTATGATAACATAGNNN 499 

Qy 500 TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 559 

'I I I I II I II II I M M II I I I II II I II I I II I I 

Db 500 NNNNNCTGGTTGCTCTCNNNNTNCCTGTTTCCATTGGGATGTTTGTTAATCACAAATGGC 559 

Qy 560 CA7VAACAATCCAAAATCATTCTC7\AGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 619 

I II I I I I I I II II I I I I II I II I I I I I 



Db 



560 CCCAAAAAGC7^ANNNNNNNNNNNNNNATTGGGTCCATCGCGGGCGCCATCCTCATTNNNN 619 



Qy 620 TGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTC 679 

Db 620 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 67 9 

Qy 68 0 TGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 739 

Db 68 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 739 

Qy 74 0 TTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTAGAAACTGGAGCTCAGAATA 799 

I I I I I I I I I I I I II I I I I I I I I I I I 

Db 74 0 NNNNNNNNNNNNNNNNNNNNNNGTGCCGAACGGTTGCTTTTGAAACGGGGATGCAGAACA 799 

Qy 800 T T C AGAT GT G CAT C AC CAT GCT C CAGT TAT C T T T CACT GCT GAGC AC T T G GT C C AGAT GT 859 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 800 CGCAGCTAT GTT CCACCAT CGT T CAGCT GT C CTT C ACT CCT GAGGAGCT CAAT GT CGTAT 859 

Qy 860 TGAGTTTCCCACTGGCCTATGGACTCTTCCAGCT 8 93 

II I I I I I I I III I I I I I I I I I I 

Db 8 60 TCACCTTCCCGCTCATCTACAGCATTTTCCAGCT 893 



RESULT 12 
BY134433 

LOCUS BY134433 372 bp mRNA linear EST 09-DEC-2002 

DEFINITION BY134433 RIKEN full-length enriched, 17.5 days embryo whole body 

Mus musculus cDNA clone L930044F15 5 ! , mRNA sequence. 
ACCESSION BY134433 

VERSION BY134433.1 GI:26269985 

KEYWORDS EST . 

SOURCE Mus musculus (house mouse) 

ORGANISM Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

REFERENCE 1 (bases 1 to 372) 

AUTHORS Okazaki,Y., Furuno,M., Kasukawa,T., Adachi,J., Bono,H., Kondo,S., 
Nikaido,I., Osato,N., Saito,R., Suzuki, H . , Yamanaka,I., 
Kiyosawa,H., Yagi,K., Tomaru,Y., Hasegawa,Y., Nogami,A. , 
Schonbach, C. , Gojobori,T., Baldarelli , R. , Hill, D. P., Bult,C, 
Hume, D. A., Quackenbush, J . , Schriml, L .M. , Kanapin,A., Matsuda,H., 
Batalov,S., Beisel,K.W., Blake, J. A., Bradt,D., Brusic,V., 
Chothia,C, Corbani , L . E . , Cousins, S., Dalla,E., Dragani, T . A. , 
Fletcher, C. F. , Forrest, A. , Frazer,K.S. , Gaasterland, T . , 
Gariboldi, M. , Gissi,C, Godzik,A. , Gough,J., Grimmond,S., 
Gustincich, S . , Hirokawa,N., Jackson, I . J . , Jarvis,E.D., Kanai,A., 
Kawaji,H., Kawasawa,Y., Kedzierski, R.M. , King,B.L., Konagaya,A., 
Kurochkin, I . V. , Lee,Y., Lenhard,B., Lyons, P. A., Maglott , D . R. , 
Maltais,L., Marchionni, L. , McKenzie,L., Miki,H., Nagashima, T . , 
Numata,K., Okido,T., Pavan,W.J., Pertea,G., Pesole,G., 
Petrovsky, N . , Pillai,R., Pontius , J . U . , Qi,D., Ramachandran, S . , 
Ravasi,T., Reed, J. C, Reed, D. J., Reid,J., Ring,B.Z., Ringwald,M., 
Sandelin,A. , Schneider , C . , Semple,C.A., Setou,M., Shimada,K., 
Sultana, R. , Takenaka,Y., Taylor,M.S., Teasdale, R. D . , Tomita,M., 
Verardo,R., Wagner, L., Wahlestedt , C. , Wang,Y., Watanabe,Y., 
Wells, C, Wilming, L . G. , Wynshaw-Boris , A. , Yanagisawa,M. , Yang, I., 
Yang,L., Yuan,Z., Zavolan,M. , Zhu,Y., Zimmer,A. , Carninci,P., 



Hayatsu,N., Hirozane-Kishikawa, T . , Konno,H., Nakamura,M., 
Sakazume, N. , Sato,K., Shiraki,T., Waki,K., Kawai, J. , Aizawa, K. , 
Arakawa,T., Fukuda,S., Hara,A., Hashizume, W. , Imotani,K., Ishii,Y., 
Itoh,M., Kagawa,I., Miyazaki,A., Sakai,K., Sasaki, D., Shibata,K., 
Shinagawa, A. , Yasunishi,A. , Yoshino,M., Waterston, R. , Lander, E.S., 
Rogers, J., Birney,E. and Hayashizaki, Y. 

TITLE Analysis of the mouse trans criptome based on functional annotation 

of 60,770 full-length cDNAs 

JOURNAL Nature 420, 563-573 (2002) 

MEDLINE 22354683 
PUBMED 12466851 
COMMENT Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 
Sciences Center (GSC) , Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome-res @gs c . riken. go. jp, 
URL: http : //genome . gsc . riken. go. jp/ 

Aizawa,K., Akimura,T., Arakawa,T., Carninci,P., Fukuda,S., 
Hirozane,T., Imotani,K., Ishii,Y., Itoh,M., Kawai,J., Konno,H., 
Miyazaki,A., Murata,M., Nakamura,M., Nomura, K., Numazaki,R., 
Ohno,M. , Sakai,K., Sakazume,N., Sasaki, D., Sato,K., Shibata,K., 
Shiraki,T., Tagami,M. , Waki,K., Watahiki,A., Muramatsu,M. and 
Hayashizaki, Y. Direct Submission 

Computational Analysis of Full-Length Mouse cDNAs Compared with 
Human Genome Sequences Mamm. Genome. 12, 67 3-677 (2 001) 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. 10 (10), 1617-1630 (2000) 

RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. 
10 (11), 1757-1771 (2000) 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. 11 (2), 281-289 (2001) 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details. 
FEATURES Location/Qualifiers 
source 1. .372 

/organism="Mus musculus" 
/mol_type="mRNA" 
/strain="C57BL/6J" 
/db_xref="taxon: 10090" 
/clone="L930044F15" 
/tissue_type="whole body" 
/dev__stage="17 . 5 days embryo" 

/clone_lib="RIKEN full-length enriched, 17.5 days embryo 
whole body" 

ORIGIN 



Query Match 21.2%; Score 240.6; DB 13; Length 372; 

Best Local Similarity 79.4%; Pred. No. 4.2e-55; 

Matches 285; Conservative 0; Mismatches 74; Indels 0; Gaps 0; 

Qy 1 AT GAGAGCCAAT T GTT C CAGCAGCT CAGCCT GCC CT GCCAACAGTT CAGAGGAGGAGCT G 60 

I I I I I I I I I I I Ml III I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 14 ATGAGCACAGACTGTGCGGGCAACTCCACCTGCCCTGTCAACAGTACGGAGGAAGACCCG 73 

Qy 61 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 120 

I I I I I! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 74 CCCGTGGGAATGGAGGGCCATGCGAATCTAAAGCTGCTTTTTACAGTGCTCTCGGCTGTG 133 

Qy 121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 134 ATGGTGGGTTTGGTCATGTTCTCTTTTGGATGTTCTGTGGAGAGTCAGAAGCTCTGGTTG 193 

Qy 181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 24 0 

III III I I I I I I I I II I I I I I I I I I I Mill I I I I I I I I I I II I I I I I I 

Db 194 CACCTCACAAGACCCTGGGGCATCCCAGTGGGCCTGCTTTCCCAGTTTGGACTTATGCCT 253 

Qy 241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 300 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I Ml I I I I ! I II I I I I I I I I 

Db 254 CTGACAGCTTATCTGTTAGCCATTGGCTTCGGTCTGAAACCATTCCAAGCTATTGCTGTC 313 

Qy 301 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGA 359 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
Db 314 CTCATGATGGGGAGCTGCCCTGGGGGCACCATCTCTAATGTTCTCACCTTCTGGGTTGA 372 



RESULT 13 

CA353647 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



Vertebrata; Euteleostomi ; 
Euteleostei; 



FEATURES 



CA353647 620 bp mRNA linear EST 05-NOV-2002 

625196 NCCCWA 1RT Oncorhynchus mykiss cDNA clone 1RT74B18_D_A09 5 ! , 
mRNA sequence. 
CA353647 

CA353647. 1 GI: 24598818 
EST. 

Oncorhynchus mykiss (rainbow trout) 
Oncorhynchus mykiss 

Eukaryota; Metazoa; Chordata; Craniata; 
Actinopterygii ; Neopterygii; Teleostei; 

Protacanthopterygii; Salmon! formes ; Salmonidae; Oncorhynchus. 
1 (bases 1 to 620) 
Rexroad,C.E. and Keele,J.W. 

Sequence analysis of a rainbow trout normalized cDNA library 
Unpublished (2002) 
Contact: Rexroad CE 

USDA, ARS, National Center for Cool and Cold Water Aquaculture 
11876 Leetown Road, Kearneysville, WV 25430, USA 
Tel: 304 724 8340 x2129 
Fax: 304 725 0351 

Email : crexroad@ncccwa . ars . usda . gov 

Single pass sequencing. Bases called with phred vO. 020425. c and 
trimmed with the aid of the trim_alt option. Vector identified by 
crossmatch vO. 990329. 

Seq primer: AGC GGAT AACAATTT CACACAGGA . 
Location/Qualifiers 



source 1. .620 

/organism="Oncorhynchus mykiss" 

/mol_type="mRNA" 

/db_xref="taxon:8022" 

/clone="lRT74Bl8_D_A09" 

/ tissue__type="pooled" 

/lab_host="DH10B" 

/clone_lib="NCCCWA 1RT" 

/note="Vector: pCMV SPORT 6; Site_l: NotI; Site_2: Sail; 
Library made from pooled tissue from brain, gill, liver, 
spleen, muscle, and kidney." 

ORIGIN 

Query Match 20.1%; Score 227.6; DB 14; Length 620; 

Best Local Similarity 62.6%; Pred. No. 2e-51; 

Matches 387; Conservative 0; Mismatches 229; Indels 2; Gaps 2; 

Qy 88 CTGGAGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGTTCTCTTTG 14 7 

111 II III II I I I II I I II I I I I I I I I I I I I I I I I I I II 
Db 1 CT GAGC CT AGTT CT C AGCAT CGT GC T GAC C GT CAT GCT GGC CAT GGT CAT GT T CT C CAT G 60 

Qy 14 8 GGAT GTT CC GT GGAGAT CC GGAAGCT GT GGT CGCACAT C AGGAGACC CT GGGGCATTGCT 207 

M I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
Db 61 GGC T G CAC C GT G GAG G CC GGAAAGCT GT G GGGACACAT CAAGAGGC CAT GGGGAAT T TT T 120 

Qy 20 8 GTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGGCCATTAGC 267 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 121 ATCGGCTTCTTGTGCCAGTTCGGCATTA-GTCCTTCACCGCCTTCGCCCTGTCGCTGGCC 17 9 

Qy 268 TTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCCCGGGGGGC 327 

II II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 

Db 18 0 TTCAACGTGCTGCCCGTGCAGGCCGTCGTCATCATCATCATGGGCTGCTGTCCCGGTGGC 239 

Qy 32 8 AC CAT CTCT AACATTTT CACCTT CT GGGTT GAT GGAGAT AT GGAT - CT CAGCAT CAGT AT 38 6 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 24 0 T C CAGCT CT AAT AT C ATT GC CT ACT GGCT GGAT GGAGAC AT GGCT C CT CAGT AT CAGCAT 299 

Qy 387 GACAACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATCTCTACAC 44 6 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 30 0 GACAGCCTGCTCCTCTATCCTGGCCCTGGGGATGATGCCTCTGTGTCTGCTCATCTACAC 359 

Qy 447 CT GGT C CT GGAGT CTT CAG CAGAAT CT C ACCATT C CTT AT CAGAACAT AGGAATT AC C CT 506 

I I I I II I II II I I I I I I I I I I I I I I I I I I I I 

Db 360 GT C T GT CT GGAC CT CT GCT GAC AC CAT C C AGAT C C C CT AC CAAAGC AT AG GT AT CACT T T 419 

Qy 507 T GT GT GCCT GAC CAT T CCT GT GGC CT TT GGT GT CT AT GT GAAT T AC AGAT GG C CAAAAC A 566 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I II 

Db 42 0 GGTGTCCCTCCTCATCCCTGTCGCCCTGGGAATCTACGTCAAAAACAAGTGGCCCGAAAT 47 9 

Qy 567 ATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGGTGGTCGC 62 6 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I III 

Db 48 0 AGCT AAAAAGAT CCT CAAGGT GGGTT CCAT AGTT GGCCT CCT CCT CAT CATCAT AAT AGC 539 

Qy 627 AGT TGCTGGTGTGGTCCTG G CGAAAGGAT CT T G GAAT T CAGAC AT CAC C CT T CT GAC CAT 68 6 

I I I I I II I I II I I I I I I I I I I I I I I I I I I 

Db 54 0 GGTGGTTGGTGGGGTGCTGTACCAGTCCTTCTGGACCATCTCTCCCTCTCTCTGGATCAT 599 



Qy 



Db 



687 CAGTTTCATCTTTCCTTT 7 04 

II I I I I I I I M 

600 CGGAGCCATCTACCCCTT 617 



RESULT 14 

BY135403 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
COMMENT 



BY135403 378 bp mRNA linear EST 09-DEC-2002 

BY135403 RIKEN full-length enriched, 17.5 days embryo whole body 
Mus musculus cDNA clone L930061J09 5', mRNA sequence. 
BY135403 

BY1354 03. 1 GI: 262 70955 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 378) 

Okazaki,Y., Furuno,M., Kasukawa,T., Adachi,J., Bono,H., Kondo,S., 
Nikaido,!., Osato,N., Saito,R., Suzuki, H., Yamanaka,!., 
Kiyosawa,H., Yagi,K., Tomaru,Y., Hasegawa,Y., Nogami,A., 
Schonbach, C. , Gojobori,T., Baldarelli, R. , Hill, D. P., Bult,C, 
Hume, D. A., Quackenbush, J . , Schriml, L .M. , Kanapin,A., Matsuda,H., 
Batalov,S., Beisel,K.W., Blake, J. A., Bradt,D., Brusic,V., 
Chothia,C, Corbani, L . E . , Cousins, S., Dalla,E., Dragani , T . A. , 
Fletcher, C.F. , Forrest, A. , Frazer,K.S. , Gaasterland, T . , 
Gariboldi,M. , Gissi,C, Godzik,A. , Gough,J., Grimmond,S., 
Gustincich, S . , Hirokawa,N., Jackson, I . J. , Jarvis,E.D., Kanai,A. , 
Kawaji,H., Kawasawa,Y., Kedzierski, R.M. , King,B.L., Konagaya,A. , 
Kurochkin, I . V. , Lee,Y., Lenhard,B., Lyons, P. A., Maglott , D . R. , 
Maltais,L., Marchionni, L. , McKenzie,L., Miki,H., Nagashima, T . , 
Numata,K., Okido,T., Pavan,W.J., Pertea,G., Pesole,G., 
Petrovsky,N. , Pillai,R., Pontius , J. U . , Qi,D., Ramachandran, S . , 
Ravasi,T., Reed, J. C . , Reed, D. J., Reid,J., Ring,B.Z., Ringwald,M., 
Sandelin,A., Schneider, C . , Semple,C.A. , Setou,M., Shimada,K., 
Sultana, R. , Takenaka,Y., Taylor,M.S., Teasdale, R. D . , Tomita,M. , 
Verardo,R., Wagner, L., Wahlestedt , C . , Wang,Y., Watanabe,Y., 
Wells, C, Wilming, L. G. , Wynshaw-Boris , A. , Yanagisawa, M. , Yang, I., 
Yang,L., Yuan,Z., Zavolan, M. , ■ Zhu, Y. , Zimmer,A. , Carninci,P., 
Hayatsu,N. , Hirozane-Kishikawa, T . , Konno,H. , Nakamura,M. , 
Sakazume,N., Sato,K., Shiraki,T., Waki,K., Kawai,J., Aizawa,K., 
Arakawa,T., Fukuda,S., Hara,A., Hashizume, W. , Imotani,K., Ishii,Y., 
Itoh,M., Kagawa,I., Miyazaki,A. , Sakai,K., Sasaki, D., Shibata,K., 
Shinagawa, A. , Yasunishi , A. , Yoshino,M., Waterston, R. , Lander, E.S., 
Rogers, J., Birney,E. and Hayashizaki , Y . 

Analysis of the mouse transcriptome based on functional annotation 

of 60,770 full-length cDNAs 

Nature 420, 563-573 (2002) 

22354683 

12466851 

Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 

Sciences Center (GSC), Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 



FEATURES 

source 



Email : genome-resSgsc . riken. go . jp, 
URL : http : //genome . gsc . riken . go . jp/ 

Aizawa,K., Akimura,T., Arakawa,T., Carninci,P., Fukuda,S., 
Hirozane,T . , Imotani,K., Ishii,Y., Itoh,M., Kawai,J., Konno,H., 
Miyazaki,A., Murata,M., Nakamura,M., Nomura, K., Numazaki,R., 
Ohno,M., Sakai,K., Sakazume,N., Sasaki, D., Sato,K., Shibata,K., 
Shiraki,T., Tagami,M. , Waki,K., Watahiki,A., Muramatsu,M. and 
Hayashizaki, Y. Direct Submission 

Computational Analysis of Full-Length Mouse cDNAs Compared with 
Human Genome Sequences Maram. Genome. 12 , 673-677 (2001) 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. 10 (10), 1617-1630 (2000) 

RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. 
10 (11), 1757-1771 (2000) 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. 11 (2), 281-289 (2001) 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details. 

Location/Qualifiers 
1. .378 

/organism="Mus musculus" 
/mol_type="mRNA" 
/strain="C57BL/6J" 
/db_xref= M taxon: 10090" 
/clone= l, L930061J09" 
/tissue_type="whole body" 
/dev_stage="17 . 5 days embryo" 

/clone_lib="RIKEN full-length enriched, 17.5 days embryo 
whole body" 



ORIGIN 



Query Match 20.0%; 
Best Local Similarity 78.8%; 
Matches 271; Conservative 



Score 227.2; DB 13; Length 378; 
Pred. No. 2.1e-51; 
0; Mismatches 73; Indels 0; 



Gaps 



0; 



Qy 

Db 



1 ATGAGAGCCAATTGTTCCAGCAGCTCAGCCTGCCCTGCCAACAGTTCAGAGGAGGAGCTG 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

35 ATGAGCACAGACTGTGCGGGCAACTCCACCTGCCCTGTCAACAGTACGGAGGAAGACCCG 94 



Qy 

Db 

Qy 

Db 



61 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 120 

I I I I I I I I I I I I I I I I I I Mil I I I I I I I I I I I I I I I I I I I I I I 

95 CCCGTGGGAATGGAGGGCCATGCG7UVTCTAAAGCTGCTTTTTACAGTGCTCTCGGCTGTG 154 

121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 180 
I M MM II I I I I I I II M I I I I I II I I M I I I I I I I I I I I I I I MM ! 

155 ATGGTGGGTTTGGTCATGTTCTCTTTTGGATGTTCTGTGGAGAGTCAGAAGCTCTGGTTG 214 



Qy 



181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 240 

Ml MM I I II I II I II II II II I Ml Mill I I II I II I I I M II II I I 



Db 215 CACCTCAGAAGACCCTGGGGCATCGCAGGGGGCCTGCTTTCCCAGTTTGGACTTATGCCT 27 4 

Qy 241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 300 

I I I I I I I I I I II I I I I I I I I I I I I I I I M III I I I I I I I I I I I I I I I I 

Db 275 CT GAC AGCT TAT CT GT TAGC CAT T G G CTT C GGT CT GAAAC CAT T C CAAGCT AT T GCT GT C 334 

Qy 301 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTT 344 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 
Db 335 CTCATGATGGGGGGCTGCCCTGGGGGCACCATCTCTAATGTTCT 378 



RESULT 15 

CA496399 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



CA496399 823 bp mRNA linear EST 14-NOV-2002 

AGENCOURT_10812 047 NCI_CGAP_ZKidl Danio rerio cDNA clone 
IMAGE: 6792624 5', mRNA sequence. 
CA496399 

CA496399. 1 GI:24959484 
EST. 

Danio rerio (zebrafish) 
Danio rerio 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

Actinopterygii ; Neopterygii ; Teleos tei ; Ostariophysi ; 

Cyprinif ormes ; Cyprinidae; Danio. 

1 (bases 1 to 823) 

NIH-MGC http://mgc.nci.nih.gov/. 

National Institutes of Health, Mammalian Gene Collection (MGC) 
Unpublished (1999) 
Contact: Robert Strausberg, Ph.D. 
Email: cgapbs-r@mail.nih.gov 
Tissue Procurement: Leonard I. Zon, M.D. 
cDNA Library Preparation: Invitrogen Corp 

cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 

DNA Sequencing by: Agencourt Bioscience Corporation 

Clone distribution: MGC clone distribution information can be 

found through the I.M.A.G.E. Consortium/LLNL at: 

http : / /image . llnl . gov 

Plate: LLAM14298 row: c column: 23 
High quality sequence stop: 706. 

Location/Qualifiers 

1. .823 

/organism="Danio rerio" 
/mol_type= ,, mRNA" 
/db_xref="taxon:7955" 
/clone="IMAGE: 6792624" 
/lab_host="DH10B (Tl-resistant) " 
/clone_lib="NCI_CGAP_ZKidl" 

/note="0rgan: kidney; Vector: pCMV-SP0RT6 . 1 ; Site_l: 
EcoRV; Site_2: NotI; Cloned unidirectionally . Primer: 
Oligo dT. Average insert size 1.8 kb. Constructed by J. 
Wang (Research Genetics , Invitrogen Corp) from tissue 
donated by L. Zon (Harvard University) . Note: this is a 
NCI_CGAP Library." 



ORIGIN 



Query Match 19.9%; Score 225.2; DB 14; Length 823; 

Best Local Similarity 63.7%; Pred. No. le-50; 



Matches 341; Conservative 



0; Mismatches 194; Indels 



0; Gaps 



0; 



Qy 95 TCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGTTCTCTTTGGGATGTT 154 

III II III I I I I II I I I I I I I I I I I I I I I I I I I I 

Db 264 TTGTGATGAGCGTTGCCATTACCGTCATGTTGGCCATGGTTATGTTTTCAATGGGCTGCA 323 

Qy 155 C C GT GGAGAT C C G GAAGCT GT GGT C GC AC AT C AG G AGAC C C T GG G GC AT T GCT GT G G GAC 214 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 324 CTGTTGAGGCTAGAAAACTGTGGGGGCACGTTCGCAGACCCTGGGGCATTTTTATAGGTT 383 

Qy 215 TGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCTC 274 

I I I I II I I I I I I I I I I I I I I I I I I II I I I I III I I II 

Db 38 4 TCCTTTGCCAGTTTGGCATCATGCCTTTCACAGCCTTCATACTTTCATTGCTTTTCAACG 4 43 

Qy 275 TGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCCCGGGGGGCACCATCT 334 

II I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I 

Db 444 TGCTGCCAGTCCAGGCGGTGGTCATCATCATCATGGGCTGCTGCCCTGGAGGATCAAGCT 503 

Qy 335 C T AAC AT T T T C AC CT T CT GGGT T GAT G GAGAT AT GGAT CT C AG CAT C AGT AT GAC AAC CT 394 

I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 504 CT AAT GT T TT CT GCT ACT GG CT T GAT GGAGACAT G GAC CT AAGC AT C AGCAT GAC AGCGT 563 

Qy 395 GTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATCTCTACACCTGGTCCT 454 

MM I I M I II II II I I I II II I I II I I I I I II I I II 

Db 564 GTTCTTCAATTTTGGCTCTGGGAATGATGCCTCTTTGTCTGCTCATCTACACCACAATCT 623 

Qy 455 GGAGT CT T CAGCAGAAT CT CACCAT T CCTTAT CAGAACAT AGGAATT AC C CT T GT GT GC C 514 

I I I I - I II II M I I II I I I II II I I I I I I II I I 

Db 62 4 GGACTGCAGGCGAT GCGATCCAGATTC CTTACGACAATATT GGGATCACACT GGT GAGTT 683 

Qy 515 T GAC CAT TCCTGTGGCCTTTGGTGT CT AT GT GAAT T AC AG AT GG C C AAAAC AAT C CAAAA 574 

II I I M I I I MM II I I II I III I I I I I I II II I I I I 

Db 684 TGCTTGTGCCTGTCGGTCTTGGGATGTTAGTGAAACACAAGTGGCCTAAAGCTGCCAAAA 743 

Qy 575 TCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGT 62 9 

I I I I I I I II I I I I II M II II I I I III I I I I I I 

Db 744 AGAT CCTNCAGGTT GGAT CT GT GGT GGGAAT CGT C CTCAT CAT CGT CATT GCAGT 798 



Search completed: March 25, 2004, 18:53:51 
Job time : 3450 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 



Run on: 



Title: 



March 25, 2004, 12:53:57 ; Search time 4659 Seconds 

(without alignments) 
10549.689 Million cell updates/sec 

US-10-091-628-1 



Perfect score: 1134 
Sequence : 

Scoring table: 



1 atgagagccaattgttccag acatcacttcatgtgaatag 1134 

IDENTITY_NUC 
Gapop 10.0 , Gapext 1.0 



Searched: 3470272 seqs, 21671516995 residues 

Total number of hits satisfying chosen parameters: 



6940544 



Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : GenEmbl : * 



1 




gb ba:* 


2 




gb htg:* 


3 




gb in:* 


4 




gb om : * 


5 




gb ov:* 


6 




gb pat:* 


7 




gb ph : * 


8 




gb pi:* 


9 




gb pr : * 


lO- 


gb ro:* 


ll: 


gb sts:* 


12: 


gb sy:* 


13: 


gb un:* 


14 


gb vi : * 


15 


em ba : * 


16 


em fun : * 


17 


em hum : * 


18 


em in: * 


19 


em mu : * 


20 


em om: * 


21 


em or:* 


22 


em ov: * 


23 


em pat : * 


24 


: em ph : * 


25 


: em pi : * 


26 


: em ro : * 


27 


: em sts:* 



28 


trill U.I1 • 


29 


^TTl "\7"1 • ^ 
trill V x • 


30 


ciTr> v»t*rr Hum* * 

Cill llULj 11L11LI. 


O -L 


trill iiuy _lhv. 




<=»rn Vi -H r") t~ Vl T~ • ~^ 
trill ULy ^ H1CL • 




triii xi uy iiiuo > 




trill IlL,y pXU . 


•J .J 


trill 11 uy J-KjfJL » 


o u 


trill iicy iiiciiii . 


37 


ern htg vrt : * 


38 


em s y : * 


39 


em__htgo_hum: * 


40 


em htgo mus : * 


41 


em htgo other: * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 

Result Query 

No. Score Match Length DB ID Description 



1 


1134 


100 . 


0 


1 1 O A 

1134 


3 


AJoo3o(jz 


AJooooUz Homo sapi 


2 


1130 . 8 


33 . 


/ 


lol / 


c 
D 


AXo / o4 / U 


aaj / o4 / u bequence 


3 


704.4 


62 . 


1 


1122 


10 


"A TCOOCA/I 

AJoo3o04 


AJ583504 Mus muscu 


4 


657 . 2 


58 . 


0 


1113 


10 


AJooobUo 


ajdooouj Rattus no 


5 


655.8 


57. 


8 


987 


6 


AX574600 


AX574600 Sequence 


6 


377 


33. 


2 


23618 


9 


AC079237 


AC079237 Homo sapi 


7 


377 


33. 


2 


192263 


9 


AC093827 


AC093827 Homo sapi 


8 


375.4 


33. 


1 


65268 


2 


AC099847 


AC099847 Homo sapi 


9 


320.4 


28. 


3 


2263 


6 


AR033870 


AR033870 Sequence 


10 


320.4 


28. 


3 


2263 


6 


132744 


132744 Sequence 1 


11 


320.4 


28. 


3 


2263 


10 


CGU02028 


U02028 Cricetulus 


12 


309.6 


27. 


3 


1916 


5 


BC053189 


BC053189 Danio rer 


13 


307.8 


27. 


1 


974 


10 


D87059 


D87059 House mouse 


14 


307.8 


27. 


1 


1629 


10 


AB002693 


AB002693 Mus muscu 


15 


306.4 


27. 


0 


1116 


4 


OCSDBATRP 


Z54357 O.cuniculus 


16 


298.4 


26. 


3 


1047 


4 


CFA581082 


AJ581082 Canis fam 


17 


297.8 


26. 


3 


1047 


6 


AR033871 


AR033871 Sequence 


18 


297.8 


26. 


3 


1047 


6 


132745 


132745 Sequence 3 


19 


297.8 


26. 


3 


3779 


6 


AX589492 


AX589492 Sequence 


20 


297.8 


26. 


3 


3779 


9 


HSU10417 


U10417 Homo sapien 


21 


293.2 


25. 


9 


4269 


10 


RNU07183 


U07183 Rattus norv 


22 


265 


23. 


4 


243333 


2 


AC120684 1 


AC120684 Rattus no 


23 


265 


23. 


4 


247127 


2 


AC098523 


AC098523 Rattus no 


24 


261.8 


23. 


1 


215210 


10 


AL713989 


AL713989 Mouse DNA 


25 


217 


19. 


1 


65958 


9 


AC105413 


AC105413 Homo sapi 


26 


184 


16. 


2 


1212 


4 


OCU131361 


AJ131361 Oryctolag 


27 


183.2 


16. 


2 


1663 


6 


AX401950 


AX401950 Sequence 


28 


183.2 


16. 


2 


1663 


6 


AX827529 


AX827529 Sequence 


29 


183.2 


16. 


2 


1663 


10 


RAT SB ACT 


M77479 Rattus norv 


30 


182.6 


16. 


1 


543 


11 


G51602 


G51602 SHGC-79180 


31 


173.8 


15. 


3 


1411 


10 


MMU95132 


U95132 Mus musculu 


32 


173.8 


15. 


3 


1579 


10 


MMU95131 


U95131 Mus musculu 


33 


173.8 


15. 


3 


1596 


10 


AB003303 


AB003303 Mouse mRN 



34 


173 


. 8 


15 . 


3 


1649 


10 


BC021154 


BC021154 Mus muscu 


35 


173 


. 6 


15 . 


3 


158 0 


6 


AX409529 


AX409529 Sequence 


36 


173 


, 6 


15 . 


3 


1580 


9 


HUMNTCP 


L21893 Human Na/ta 


37 


141 


. 4 


12 . 


5 


1988 


6 


AX921120 


AX921120 Sequence 


38 


138 


.2 


12 . 


2 


1437 


9 


AK126542 


AK126542 Homo sapi 




138 


.2 


1 ? 


9 

•i. 


1 6R6 


9 


J-J Vj> VJ _L W i u 


BC012048 Homo sapi 




138 


.2 




9 


1 7 OS 


9 


Rrm 9066 


BC019066 Homo sapi 


41 


133 


.2 


11. 


7 


976 


4 


AY292653 


AY292653 Oryctolag 


42 


133 


.2 


11. 


7 


27247 


4 


OCAJ2005 


AJ002005 Oryctolag 


43 


131 


11. 


6 


152080 


5 


AL953877 


AL953877 Zebrafish 


44 


118 


.8 


10. 


5 


2020 


6 


AX589493 


AX589493 Sequence 


45 


118 


.8 


10. 


5 


2020 


9 


HSISDBA1 


U67669 Human ileal 
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RESULT 1 
AJ583502 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



gene 
CDS 



Craniata; Vertebrata; Euteleos tomi ; 



AJ583502 1134 bp mRNA linear PRI 24-SEP-2003 

Homo sapiens mRNA for sodium-dependent organic anion transporter 
(SOAT gene) . 
AJ583502 

AJ583502. 1 GI: 35208 820 

SOAT gene; sodium-dependent organic anion transporter. 
Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 

Geyer, J. and Petzinger,E. 

cloning of a sodium-dependent organic anion transporter (SOAT) from 

human adrenal gland 

Unpublished 

2 (bases 1 to 1134) 

Geyer, J. 

Direct Submission 
Submitted (23-SEP-2003) Geyer 

Toxicology, University of Giessen, Frankfurter Str. 
Giessen, GERMANY 

Location/Qualifiers 

1. .1134 

/organism-"Homo sapiens" 

/mol_type="mRNA" 

/db_xref="taxon: 9606" 

/ ch r omo s ome= " 4 " 

/tissue_type="adrenal gland" 

1. .1134 

/gene="SOAT" 

1. .1134 

/gene-" SOAT" 

/codon_start=l 

/ evidence=experimental 

/product="sodium-dependent organic anion transporter" 
/protein_id="CAE47477 . 1" 
/db_xref="GI : 35208821" 

/translation="MRANCSSSSACPANSSEEELPVGLEVHGNLELVFTWSTVMMGL 



Institute of Pharmacology and 
107, 35392 



LMFSLGCSVEIRKLWSHIRRPWGIAVGLLCQFGLMPFTAYLLAISFSLKPVQAIAVLI 
MGCCPGGTI SNI FT FWVDGDMDLS I SMTTCSTVAALGMMPLCI YLYTWSWSLQQNLT I 
P YQNI GI TLVCLT I PVAFGVYVN YRWPKQS KI I LKI GAWGGVLLLWAVAGWLAKG 
SWNSDITLLTISFIFPLIGHVTGFLLALFTHQSWQRCRTISLETGAQNIQMCITMLQL 
SFTAEHLVQMLSFPLAYGLFQLIDGFLIVAAYQTYKRRLKNKHGKKNSGCTEVCHTRK 
STSSRETNAFLEVNEEGAITPGPPGPMDCHRALEPVGHITSCE" 

ORIGIN 

Query Match 100.0%; Score 1134; DB 9; Length 1134; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 1134; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 AT GAGAGC CAATT GTT C CAGCAGCT CAGC CT GCCCT GCCAACAGTT CAGAGGAGGAGCT G 60 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1 ATGAGAGCCAATTGTTCCAGCAGCTCAGCCTGCCCTGCCAACAGTTCAGAGGAGGAGCTG 60 

Qy 61 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 12 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I 
Db 61 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 120 

Qy 121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 18 0 

M I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 1 I 
Db 121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 180 

Qy 181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 240 

Qy 241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTG7WVGCCAGTCCAAGCTATTGCTGTT 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 

Db 241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 300 

Qy 301 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 360 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I 

Db 3 01 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 360 

Qy 361 GGAGAT AT G GAT CT C AG CAT CAGT AT GACAAC CT GT T C CAC C GT G GC C GC C CT GG GAAT G 420 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I 1 I I I I I I 
Db 361 GGAGAT AT GGAT CT CAG CAT CAGT AT GACAACCT GTT C CAC CGT GGC C GC C CT GGGAAT G 42 0 

Qy 421 ATGCCACTCTGCATTTATCTCTACACCTGGTCCTGGAGTCTTCAGCAGAATCTCACCATT 480 

I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 421 AT GC CAC T CT G CAT TT AT CT CT ACAC CT GGT CCT GGAGT CT T CAG CAGAAT CT C AC C ATT 480 

Qy 4 81 CCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTC 540 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 481 CCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTC 540 

Qy 541 TAT GT GAAT T AC AG AT GGC C AAAAC AAT C C AAAAT CAT T C T C AAG AT TGGGGCCGTTGTT 600 

I I | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
Db 541 TAT GT GAAT T AC AGAT GGC C AAAAC AAT C C AAAAT CAT T C T C AAGAT TGGGGCCGTTGTT 600 

Qy 601 GGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGG 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 GGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGG 660 



Qy 



661 AATTCAGACATCACCCTTCTGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACG 720 



Db 



661 



! I I II I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 

AATTCAGACATCACCCTTCTGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACG 720 



Qy 721 GGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTA 780 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 721 GGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTA 7 80 

Qy 781 GAAACTGGAGCT CAGAAT ATTCAGAT GT GCAT CAC CAT GCT CCAGT T ATCTTT CACT GCT 840 

II I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 781 GAAACT GGAGCT CAGAAT ATT C AG AT GT GCAT CAC CAT GCT CCAGT T AT CT T T CACT GCT 840 

Qy 841 GAGC ACT T GGT C C AGAT GT T GAGT T T C C CACT GGC CT AT GGACT CT T C C AG CT GAT AGAT 900 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I 
Db 841 GAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGAT 900 

Qy 901 G GAT T T CT TAT T GT T G C AG CAT AT C AGAC GT AC AAGAG GAGAT T GAAGAAC AAAC AT G GA 960 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I 
Db 901 G GAT T T C T TAT T GT T G C AG CAT AT C AG AC G T AC AAGAG GAGAT T GAAGAAC AAAC AT G G A 960 

Qy 961 AAAAAGAACT CAGGT T GC ACAGAAGT CT G CC ATAC GAG GAAAT CGACT T CT T C CAGAGAG 1020 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 961 AAAAAGAACT CAGGT T G C AC AG AAGT CT GC C AT AC GAG GAAAT C GACT T CT T C CAGAGAG 1020 

Qy 1021 ACCAATGCCTTCTTGGAGGTGAATGAAGAAGGTGCCATCACTCCTGGGCCACCAGGGCCA 1080 

I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I II I I II I I I I I I I I I I II I I I I I I I I I 
Db 1021 ACCAATGCCTTCTTGGAGGTGAATGAAGAAGGTGCCATCACTCCTGGGCCACCAGGGCCA 1080 

Qy 1081 AT GG AT T GC CAC AG GGCT CT C GAGC C AGT T GG C CAC AT CACT T CAT GT GAAT AG 1134 

I I I I I I I I I I I I I II II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1081 AT GGAT TGC CACAG GGCT C T C GAGC CAGT T GG C CACAT C ACT T CAT GT GAAT AG 1134 



RESULT 2 
AX575470 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 
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AUTHORS 
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source 



AX575470 1517 bp DNA linear PAT 07-JAN-2003 

Sequence 26 from Patent WO02077237. 

AX575470 

AX57547 0. 1 GI : 27552 072 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 

Lee,E.A., Ding,L., Baughn,M.R., Tribouley, C .M. , Bruns,C.M., 
Elliott, V. S. , Walia,N.K., Forsythe, I . J. , Raumann, B . E . , Burford,N., 
Lal,P.G., Thornton, M., Gandhi, A. R., Arvizu,C, Yao, M. G . , Yue, H . , 
Xu,Y., Hafalia,A.J. and Ison,C.H. 
Transporters and ion channels 
Patent: WO 02077237-A 26 03-OCT-2002; 
Incyte Genomics, Inc. (US) 

Location/ Qualifiers 

1. .1517 

/organism="Homo sapiens" 
/mol_type="unassigned DNA" 
/db_xref="taxon: 9606" 
/note="Incyte ID No: 7472881CB1" 



ORIGIN 



Query Match 99.7%; Score 1130.8; DB 6; Length 1517; 

Best Local Similarity 99.8%; Pred. No. 0; 

Matches 1132; Conservative 0; Mismatches 2; Indels 0; Gaps 0; 

Qy 1 AT GAGAGCCAATT GTT CCAGCAGCTCAGCCT GC CCT GCCAACAGTTCAGAGGAGGAGCTG 60 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 

Db 24 9 AT G AGAGC C AAT T GT T C C AG C AGCT C AGC CTGCCCTGC C AAC AGT T C AGAGGAGGAGCT G 308 

Qy 61 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 12 0 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 309 CCAGTGGGACTGGAGGCGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGCCCACTGTG 368 

Qy 121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 18 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 369 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 428 

Qy 181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 24 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 429 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 48 8 

Qy 241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 4 89 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 54 8 

Qy 301 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 549 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 608 

Qy 361 GGAGATATGGATCTCAGCATCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATG 42 0 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 

Db 609 GGAGATATGGATCTCAGCATCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATG 668 

Qy 421 AT GC CACT CT GCATTT AT CT CTACAC CT GGT C C T GGAGT CTT C AGCAGAAT CT CACCAT T 48 0 

I I 1 1 I I I I 1 1 I 1 1 I 1 1 I I I 1 1 1 i I I I I I 1 1 I I I I I 1 1 1 I I 1 1 1 1 1 1 1 I I I 1 1 1 I I I I I 1 1 

Db 669 AT GC C ACT CT GC ATT T AT CT CTACAC CT GGT CC T GGAGT C TT C AGCAGAAT CT CAC CATT 72 8 

Qy 481 CCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTC 54 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 729 CCT TAT CAGAAC AT AGGAAT TAG CCT T GT GT GC CT GACCATT C CT GT GGC CT T T GGT GT C 78 8 

Qy 541 TAT GT GAAT TACAGAT GG C CAAAACAAT C CAAAAT CATT CT C AAGAT T GG GG C C GT T GTT 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I 

Db 789 TAT GT GAAT TACAGAT GGC CAAAACAAT CCAAAAT CATT CT CAAGAT T GGGGC C GT T GTT 84 8 

Qy 601 GGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGG 660 

I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 

Db 84 9 GGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGG 908 

Qy 661 AAT T C AG AC AT CAC C C T T C T G AC CAT C AGT T T CAT CTTTCCTTT GAT T G G C CAT G T CAC G 720 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 909 AAT T C AG AC AT CAC C C T T C T G AC CAT C AGT T T CAT CTTTCCTTT GAT T G G C CAT G T CAC G 968 

Qy 721 GGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTA 780 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 969 GGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTA 1028 



Qy 781 GAAACT GGAGCT CAGAAT AT T CAGAT GT G CAT CAC CAT G CT C CAGT TAT CT TT C ACT GCT 840 

I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1029 GAAACT GGAGCT CAGAAT AT T CAGAT GT GCAT CAC CAT GCTCC AGTT AT CTTT CACT GCT 108 8 

Qy 841 GAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGAT 900 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 108 9 GAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGAT 1148 

Qy 901 GGATTT CTT ATT GT T GCAGCAT AT CAGACGT ACAAGAGGAGATT GAAGAACAAACAT GGA 960 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 114 9 GGATTT CTTATT GTT GCAGCAT ATCAGACGTACAAGAGGAGATTGAAGAACAAACAT GGA 12 08 

Qy 961 AAAAAGAACT C AG GT T GC AC AG AAGT CT G CC AT AC GAGGAAAT C GACT T CT T C C AGAGAG 1020 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I 
Db 12 09 AAAAAGAACTCAGGT TGCACAGAAGT CTGCCATAC GAGGAAAT CGACTTCTT C CAGAGAG 1268 

Qy 1021 AC C AAT GC CT T CT T GGAG GT GAAT GAAGAAGGT G C CAT CAC TCCTGGGC CAC C AGG GC C A 1080 

I 1 1 1 1 | I I I I I I 1 1 I 1 1 I 1 1 I 1 1 1 1 1 I 1 1 1 1 I I i I I I I I I I 1 1 1 1 I 1 1 1 1 I I I 1 1 1 I I I I 

Db 1269 ACCAATGCCTTCTTGGAGGTGAATGAAGAAGGTGCCATCACTCCTGGGCCACCAGGGCCA 1328 

Qy 1081 AT GGAT T G C CAC AGG G CT CT C GAGC CAGT T GG C CAC AT CACT T CAT GT GAAT AG 1134 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1329 AT GGAT T GC CACAGGGCT CT C GAGC CAGT T GGC CAC AT C ACTT CAT GT GAAT AG 1382 



RESULT 3 
AJ583504 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



AJ583504 1122 bp mRNA linear ROD 24-SEP-2003 

Mus musculus mRNA for sodium-dependent organic anion transporter 
(SOAT gene) . 
AJ583504 

AJ583504 . 1 GI: 35208 824 

SOAT gene; sodium-dependent organic anion transporter. 
Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 

Geyer,J., Godoy, J.R. and Petzinger,E. 

Cloning of a sodium-dependent organic anion transporter (SOAT) from 

mouse liver 

Unpublished 

2 (bases 1 to 1122) 

Geyer , J. 

Direct Submission 

Submitted ( 23-SEP-2003 ) Geyer J., Institute of Pharmacology and 
Toxicology, University of Giessen, Frankfurter Str. 107, 35392 
Giessen, GERMANY 

Location/Qualifiers 

1. .1122 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="C57BL/6J" 

/db_xref="taxon: 10090" 

/ chromosome="5" 

/ tissue_type="liver" 



gene 1. .1122 

/gene="SOAT" 

CDS 1. .1122 

/gene="SOAT" 



/codon_start=l 

/product="sodium-dependent organic anion transporter" 
/protein_id="CAE4 7 47 9 . 1 " 
/db_xref="GI: 35208825" 

/trans la tion="MSTDCAGNSTCPWSTEEDPPVGMEGHANLKLLFTVLSAVMVGL 
VMFSFGCSVESQKLWLHLRRPWGIAVGLLSQFGLMPLTAYLLAIGFGLKPFQAIAVLM 
MGSCPGGTISNVLTFWVDGDMDLSISMTTCSTVAALGMMPLCLYIYTRSWTLTQNLVI 
P YQ S I G I T L VS L WP VAS GVYVN Y RW P KQAT VI L KVGAI L GGML L L WAVT GMVLAKG 
WNTDVTLLVISCIFPLVGHVTGFLLAFLTHQSWQRCRTISIETGAQNIQLCIAMLQLS 
FSAEYLVQLLNFALAYGLFQVLHGLLIVAAYQAYKRRQKSKCRRQHPDCPDVCYEKQP 
RETSAFLDKGDEAAVTLGPVQPEQHHRAAELTSHIPSCE" 



ORIGIN 



Query Match 62.1%; Score 704.4; DB 10; Length 1122; 

Best Local Similarity 77.7%; Pred. No. 2.2e-204; 

Matches 881; Conservative 0; Mismatches 241; Indels 12; Gaps 2; 

Qy 1 ATGAGAGCCAATTGTTCCAGCAGCTCAGCCTGCCCTGCCAACAGTTCAGAGGAGGAGCTG 60 

I I I I I I I I I I I lit III I I I 1 I I I I I I I I I I I I I I I I I I I I I I 

Db 1 ATGAGCACAGACTGTGCGGGCAACTCCACCTGCCCTGTCAACAGTACGGAGGAAGACCCG 60 

Qy 61 C CAGT GGGACT GGAGGT GCAT GGAAACCT GGAGCTCGTTTT CACAGT GGT GTCCACT GT G 12 0 

I I I I I I I I I I I I I I I I I I I 1 II I I I I I I I I I I I I I I I I I I I I I I 

Db 61 CCCGTGGGAATGGAGGGCCATGCGAATCTAAAGCTGCTTTTTACAGTGCTCTCGGCTGTG 12 0 

Qy 121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 180 

Ml I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 121 ATGGTGGGTTTGGTCATGTTCTCTTTTGGATGTTCTGTGGAGAGTCAGAAGCTCTGGTTG 180 

Qy 181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 240 

III I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I II I I I I I I 

Db 181 CACCTCAGAAGACCCTGGGGCATCGCAGTGGGCCTGCTTTCCCAGTTTGGACTTATGCCT 240 

Qy 241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 30 0 

I I I I I I I II I I I I I I I I I I I I I I M I I I I III I I I I I I II I I I I I I I I 

Db 241 CT GACAGCT TAT C T GT T AGC CAT T GGCTT C GGT CT GAAAC CAT T CCAAG CT AT T GCT GT C 300 

Qy 301 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 360 

I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I 

Db 301 CTCATGATGGGGAGCTGCCCTGGGGGCACCATCTCTAATGTTCTCACCTTCTGGGTTGAT 360 

Qy 361 GGAGAT AT GGAT CT CAGCAT CAGT AT GACAAC CTGTTCCACCGT GGCCGCCCT GGGAAT G 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 361 GGAGAT AT G GAT CT CAGCAT CAGT AT GACAAC CT GT T CCACAGT GG CCGC C CT GGGAAT G 42 0 

Qy 421 AT GCCACT CTGCATTT AT CT CTAC AC CT GGT CCT GGAGT CTT CAGCAGAAT CT CAC CATT 4 80 

I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I 

Db 421 ATGCCTCTCTGCCTCTACATCTACACCCGGTCCTGGACTCTGACACAGAACCTCGTCATT 480 



Qy 

Db 



481 
481 



CCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTC 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I III III 

CCGTATCAGAGCATAGGAATTACCCTTGTGTCCCTGGTGGTTCCTGTGGCTTCTGGCGTC 



540 
540 



Qy 541 TAT GT GAATT ACAGAT G GC CAAAACAAT C CAAAAT CAT T CT CAAGAT TGGGGCCGTTGTT 600 

I I I I I I M I I I II I I I I I I II III I I I II I I I I I I I I I M Ml M I 
Db 541 TAT GT GAAT T AT AGGT GGC CAAAGCAAGCAACGGT C ATT CT CAAGGT CGGAGC C AT T CT G 600 

Qy 601 GGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGG 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I Ml 

Db 601 GGTGGCATGCTCCTCCTGGTGGTGGCAGTTACTGGCATGGTCCTGGCAAAAGG CTGG 657 

Qy 661 AAT T CAGAC AT C AC C CT T CT GAC CAT C AGT T T CAT CTTTCCTTT GAT T G GC C AT GT C AC G 72 0 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II 
Db 658 AAC AC AGAC GT C ACT CTTCTGGT CAT C AG CT GC AT TTTCCCCTTGGT C GGC CAT GT C AC A 717 

Qy 721 GGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTA 78 0 

II || I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 718 GGCTTCCTGCTGGCATTCCTCACCCACCAATCTTGGCAAAGGTGCAGGACCATTTCCATA 777 

Qy 7 81 GAAAC T GGAGCT C AGAAT AT T C AGAT GT GC AT C AC CAT G C T C C AGT T AT CT T T C ACT G CT 84 0 

I I I I I II I I I I I I II II III I I I I I I I I II I I I I I III I II III II I I I 
Db 778 GAGACTGGCGCTCAGAACATCCAGCTGTGCATCGCCATGCTGCAGCTGTCCTTCTCTGCT 837 

Qy 841 GAGCACTT GGT C CAGAT GTT GAGTTT CCCACTGGCCT AT GGACT CTT CCAGCT GAT AGAT 90 0 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I Ml I 

Db 838 GAGTACCTGGTCCAGCTGCTAAACTTTGCATTGGCCTATGGACTCTTCCAAGTGCTGCAC 897 

Qy 901 GGATTT CTTATT GTTGCAGCATATCAGACGTACAAGAGGAGATT GAAGAACAAACAT GGA 960 

II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II III I 

Db 898 GGG CT GCT C ATT GT C GCAGC AT AT CAGGC ATACAAGAGGAGGC AGAAGAGT AAAT GCAGG 957 

Qy 961 AAAAAGAACT CAGGTT GCACAGAAGT CT GCCAT ACGAGGAAAT CGACTT CTT CCAGAGAG 1020 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 

Db 958 AGACAGCAC CCGGATT GCCCAGACGT CTGCTACGAGAAGCA GC CCAGAGAG 1008 

Qy 1021 ACCAATGCCTTCTTGGAGGTGAATGAAGAAGGTGCCATCACTCCTGGGCCACCAGGGCCA 1080 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1009 ACCAGTGCTTTCTTGGATAAAGGGGATGAGGCTGCCGTAACTCTGGGGCCAGTGCAGCCA 1068 

Qy 1081 AT G GAT T GC CAC AGGGCT CT C G AGC C AGTT G GC C AC AT C ACT T CAT GT GAAT AG 1134 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1069 GAGC AG CAC CACAGGGCT GCT GAGCT GACT AGC CACATT C CT T CAT GT GAAT AG 1122 



RESULT 4 
AJ583503 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 



AJ583503 1113 bp mRNA linear ROD 24-SEP-2003 

Rattus norvegicus mRNA for sodium-dependent organic anion 
transporter (SOAT gene) . 
AJ583503 

AJ583503.1 GI: 35208822 

SOAT gene; sodium-dependent organic anion transporter. 
Rattus norvegicus (Norway rat) 
Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; 
Rattus . 
1 

Geyer, J. and Petzinger,E. 

Cloning of a sodium-dependent organic anion transporter (SOAT) from 



JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



gene 
CDS 



rat adrenal gland 

Unpublished 

2 (bases 1 to 1113) 

Geyer, J. 

Direct Submission 

Submitted (23-SEP-2003 ) Geyer J. , Institute of Pharmacology and 
Toxicology, University of Giessen, Frankfurter Str. 107, 35392 
Giessen, GERMANY 

Location/Qualif iers 

1. .1113 

/organism="Rattus norvegicus" 

/mol_type="mRNA M 

/strain="Wistar" 

/db_xref="taxon: 10116" 

/ chr omo s ome= "14" 

/tissue_type="adrenal gland" 

1. .1113 

/gene="S0AT" 

1. .1113 

/gene="SOAT" 

/codon_start=l 

/product="sodium-dependent organic anion transporter" 

/protein_id="CAE47478 . 1" 

/db_jxref="GI:35208823" 

/translation="MSADCEGNSTCPANSTEEDPPVGMEGQGSLKLVFTVLSAVMVGL 
VMFSFGCSVESRKLWLHLRRPWGIAVGLLCQFGLMPLTAYLLAIGFGLKPFQAIAVLI 
MGSCPGGTVSNVLTFWVDGDMDLSISMTTCSTVAALGMMPLCLYVYTRSWTLPQSLTI 
P YQ S I G I TLVS LWPVAS GI YVN YRW P KQAT FI LKVGAAVGGMLLLVVAVT GWLAKG 
WNIDVTLLVISCIFPLVGHVMGFLLAFLTHQSWQRCRTISIETGAQNIQLCIAMMQLS 
FSAEYLVQLLNFALAYGLFQVLHGLLIVAAYQAYKRRQKSQYRRQHPECQDISSEKQP 
RET S AFLDKGAEAAVTLGLEQHHRTAELT SHVP S CE " 



ORIGIN 



Query Match 58.0%; 
Best Local Similarity 75.8%; 
Matches 860; Conservative 



Score 657.2; DB 10; 
Pred. No. 6.8e-190; 
0; Mismatches 253; 



Length 1113; 
Indels 21; Gaps 



3; 



Qy 

Db 



1 AT GAGAG CCAATT GTT C CAGCAGCT CAGC CT GC C CT GC C AACAGTT CAGAGGAGGAG CT G 60 

I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I Mill II I 

1 AT GAGC GCAGACT G CGAGG G CAACT C CAC C T GC C CT GC CAACAG CAC GGAG GAAGAC C CA 60 



QY 
Db 

Qy 

Db 

Qy 

Db 
Qy 

Db 



61 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 120 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

61 CCCGTGGGAATGGAGGGACAGGGGAGCCTGAAGCTTGTTTTCACAGTCCTGTCGGCTGTG 120 

121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I II I I 

121 ATGGTGGGTCTGGTCATGTTCTCCTTTGGATGTTCAGTGGAGAGTCGGAAGCTCTGGCTG 180 

181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 24 0 

Ml It I I I I I I I I II I I I I I I II I I I I I Mill I I I I I I I I I I I I I I I I I II I I 

181 CACCTCAGAAGACCCTGGGGCATCGCAGTGGGCCTGCTTTGCCAGTTTGGGCTCATGCCT 240 

241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 300 

I I I M I I II I I I II I I I I I I I I I I I II I I I III I I I I I I I I I I I I I I I 

241 CTGACAGCTTATCTGCTAGCCATTGGCTTCGGTCTGAAACCATTCCAAGCTATTGCCGTC 300 



301 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 360 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

301 CTCATCATGGGGAGCTGCCCTGGGGGCACCGTCTCTAATGTCCTCACCTTCTGGGTTGAT 360 

Qy 361 G GAG AT AT G GAT C T C AG CAT C AG T AT G AC AAC C T GT T C C AC CGTGGCCGCCCTGG G AAT G 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I 

Db 361 GGAGATATGGACCTCAGCAT CAGCAT GACGACCT GCT CCACAGT GGCTGCT CT GGGAATG 420 

Qy 421 AT GC C ACT CT G CAT T TAT CT C T ACAC CT GGT C CT G GAGT CT T C AGCAGAAT CT C AC CAT T 48 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I 

Db 421 ATGCCCCTCTGCCTCTACGTCTACACCCGGTCCTGGACTCTTCCACAGAGCCTCACCATC 48 0 

Qy 4 81 CCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTC 54 0 

I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I II II 

Db 481 CCGTACCAGAGCATAGGAATTACCCTTGTGTCCCTGGTTGTTCCTGTGGCCTCCGGCATC 54 0 

Qy 541 TAT GT G AAT T AC AG AT G G C C AAAAC AAT C C AAAAT CAT T C T C AAG AT TGGGGCCGTTGTT 60 0 

I I I I I I I I I I I II I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 541 TATGTGAATTATAGGTGGCCAAAGCAAGCAACATTCATTCTCAAGGTCGGGGCTGCTGTT 600 

Qy 601 GGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGG 660 

I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I III 

Db 601 GGCGGCATGCTCCTCCTGGTGGTGGCAGTTACCGGCGTGGTCCTGGCAAAGGG CTGG 657 

Qy 661 AATTCAGACATCACCCTTCTGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACG 72 0 

II III I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 658 AACATAGATGTCACTCTTCTGGTCATCAGCTGTATTTTTCCCTTGGTCGGCCATGTCATG 717 

Qy 721 GGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTA 780 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I. I I I I I I I I I I 
Db 718 GGCTTCCTGCTGGCGTTCCTCACCCACCAGTCTTGGCAAAGGTGCAGGACGATTTCCATA 777 

Qy 781 GAAACT GGAGCT CAGAAT ATT CAGAT GT GCAT CACCATGCT CCAGTT AT CTTT CACT GCT 84 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 778 GAGACCGGAGCACAGAACATCCAGCTGTGCATTGCCATGATGCAGCTGTCCTTCTCTGCT 837 

Qy 841 GAGCACTT GGT CCAGAT GTT GAGT TT CCCACT GGCCT AT GGACT CTT C CAGCT GAT AGAT 900 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I 

Db 838 GAGTACCTGGTCCAGCTGTTAAACTTCGCCCTGGCCTACGGACTCTTCCAAGTGCTGCAC 897 

Qy 901 G GAT T T C T TAT T GT T G C AG CAT AT C AG AC G T AC AAG AG GAG AT T GAAG AAC AAAC AT G G A 960 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I 

Db 898 GG GCT GCT CAT T GT CG CAGCAT AT C AGGC AT ACAAGAGGAGGCAGAAGAGT CAAT AC AGG 957 

Qy 961 AAAAAGAACT CAGGT T GCACAGAAGTCTGCCAT ACGAGGAAAT CGACTT CTT CCAGAGAG 102 0 

I I I I I I I I III III I I I I I II I I I I I I I I I I 

Db 958 AGACAGCACC CGGAGT GCCAAGACAT CAGCTCT GAGAAGCA GC CCAGAGAG 1008 

Qy 1021 AC CAAT G C CT T CT T GGAG GT GAAT GAAGAAG GT GC C AT CACT C CT GGGC C AC C AGG G C C A 108 0 

I I I I I I I I I II I I I II I I I I II II III II I I I I I 

Db 1009 ACCAGTGCCTTCTTG GATAAAGGGGCTGAGGCTGCTGTAACTCTGGGGCTA 1059 

Qy 1081 AT GGATT GC CACAGGGCT CT CGAGCCAGTT GGC CACAT CACTT CATGT GAATAG 1134 

II I I I I I I I I III I I I I I I I I I I I I I I I I I I I 

Db 1060 GAGCAGCAC CACAGGAC CGCT GAACT GAC CAGT CAC GTT C CT T CAT GT GAATAG 1113 



Qy 

Db 



RESULT 5 
AX574600 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 

FEATURES 

source 



ORIGIN 



linear PAT 07-JAN-2003 



AX574600 987 bp DNA 

Sequence 11 from Patent WO0233087. 
AX574600 

AX574600. 1 GI: 27551854 



Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 

Edinger,S., Gerlach,V., Macdougall, J. R. , Malyankar,U.M. , 
Smithson,G., Millet, I . , Peyman, J. A. , Stone, D. J., Gunther,E., 
Ellerman,K., Shimkets , R. A. , Padigaru,M., Guo,X., Pattura j an, M. , 
Taupier, R. J. , Burgess , C . E . , Zerhusen, B . D . , Kekuda,R., Spytek,K.A., 
Gangolli, E.A. , Fernandes , E . R . and Gorman, L. 
Proteins and nucleic acids encoding same 
Patent: WO 0233087-A 11 25-APR-2002; 
Curagen Corporation (US) 

Location/Qualifiers 

1. .987 

/organism="Horno sapiens" 
/mol_type="unassigned DNA" 
/db xref="taxon:9606" 



Query Match 57.8%; 
Best Local Similarity 86.3%; 
Matches 803; Conservative 



Score 655.8; DB 6; 
Pred. No. 1.8e-189; 
0; Mismatches 77; 



Length 987; 



Indels 



51; Gaps 



5; 



Qy 

Db 



AT GAGAGC CAAT T GTT C CAG CAG CT C AGC CTGC CCT GCCAACAGTT CAGAGGAGGAGCT G 60 
I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

ATGAGAGCCAATTGTTCCAGCAGCTCAGCCTGCCCTGCCAACAGTTCAGAGGAGGAGCTG 60 



Qy 

Db 



61 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 12 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

61 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTATC 12 0 



Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 
Db 

Qy 



121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 18 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

121 AT GAT G GGGCT GC T CAT GT T CT CT T T G GGATGT T C C GT GGAGAT C C GGAAGCT GT GGTC G 180 

181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 24 0 

241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 300 

301 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
301 CTCATCATGGGCTGCTG-CCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 359 

361 GGAGAT ATGGATCTCAGCATCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATG 420 
I I I I I I I I I I I I I I I I I I I I I I II I I I I I 



Db 



360 GGAGAT AT G GAT CT C A- 



GGTGCCCTGGGAATG 390 



Qy 421 AT GCCACT CT GC AT TT AT CT CT ACAC CT G GT CCT GGAGT CT T CAGC AGAAT CT CAC CATT 4 80 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 391 ATGCCACTCTGCATTTATCTCTACACCTGGTCCTGGAGTCTTCAGCAGAATCTCACCATT 450 

Qy 481 C CT TAT CAGAAC A TAGGAATTACCCTTGTGTGCCTGACCATTCCTGTG 52 8 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 451 CCTTATCAGAACATAGGTCTGTCTTTAGGAATTACCCTTGTGTGCCTGACCATTCCTGTG 510 

Qy 529 GC CTT T GGT GT CT AT GT GAATT AC AGAT GGC C AAAACAAT C CAAAAT CAT T CT CAAGAT T 588 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I M I 
Db 511 GCCTTTGGTGTC TAT GT G AAT T AC AG AT GGC C AAAACAAT C CAAAAT CAT T C T C AA 566 

Qy 589 GGGGCCGTTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCG 648 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 

Db 567 — GGCCGTTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCG 624 

Qy 64 9 AAAG GAT C T T G G AAT T C AGAC AT CAC C C T T C T G AC CAT C AGT T T CAT CTTTCCTTT GAT T 7 08 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 62 5 AAAG GAT C T T G G AAT T C AG AC AT CAC C C T T C T G AC CAT C AG T T T CAT CTTTCCTTT GAT T 684 

Qy 709 GGCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGG 768 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 68 5 GGCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGACCTTG 74 4 

Qy 769 ACAATTT CCTTAGAAACT GGAGCT C AGAAT ATT C AG AT GT GCAT CAC CAT GCT CCAGTTA 82 8 

I I I I I I I I I I I I I Mil'- II I I 

Db 745 CCTATCTTTTTAG GTTTAGCTTTCAAGACACCCTGTGATACCCTACTCGCAATGACT 801 

Qy 82 9 TCTTTCACTGCTGAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTC 888 

I I I I I I I I I I I M II I I II I I I 

Db 802 TCGTGTCCTGAATGTTCCAGGCTCATCTATGCCTTCATTCCTCTGCTATATGGACTCTTC 861 

Qy 88 9 C AGC T GAT AGAT GGAT T T CT T AT T GT T G C AG 919 

I I II I I I I I I I I I II II I I I I II I II I I II 
Db 8 62 C AGC T GAT AG AT GGAT T T C T TAT T GT T GAAG 892 



RESULT 6 

AC079237/c 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
MEDLINE 
PUBMED 

REFERENCE 



AC079237 23618 bp DNA linear PRI 21-FEB-2002 

Homo sapiens BAC clone RP11-71U3 from 4, complete sequence. 
AC079237 

AC079237.7 GI : 18482 358 
HTG. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 23618) 
Sulston, J.E. and Waterston,R. 

Toward a complete human genome sequence 
Genome Res. 8 (11), 1097-1108 (1998) 
99063792 
9847074 

2 (bases 1 to 23618) 



AUTHORS 
TITLE 
JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



Radionenko, M. and Meyer ,R. 

The sequence of Homo sapiens BAC clone RP11-711J3 
Unpublished (2001) 

3 (bases 1 to 23618) 
Waterston, R. H. 
Direct Submission 

Submitted ( 24-AUG-2000 ) Genome Sequencing Center, Washington 
University School of Medicine, 4444 Forest Park Parkway, St. Louis, 
MO 63108, USA 

4 (bases 1 to 23618) 
Waterston, R. H . 
Direct Submission 

Submitted ( 03-FEB-2002 ) Genome Sequencing Center, Washington 
University School of Medicine, 4444 Forest Park Parkway, St. Louis, 
MO 63108, USA 

5 (bases 1 to 23618) 
Waterston, R. 

Direct Submission 

Submitted (2 l-FEB-2002 ) Department of Genetics, Washington 
University, 4444 Forest Park Avenue, St. Louis, Missouri 63108, USA 
On Feb 3, 2002 this sequence version replaced gi: 18151062. 
Genome Center 

Center: Washington University Genome Sequencing Center 

Center code: WUGSC 

Web site: http://genome.wustl.edu/gsc 
Contact : sapiens @watson . wustl . edu 

Summary Statistics 

Center project name: H_NH071U03 



NOTICE: This sequence may not represent the entire insert of this 
clone. It may be shorter because we only sequence overlapping 
clone sections once, or longer because we provide a small overlap 
between neighboring data submissions. 

This sequence was finished as follows unless otherwise noted: 
all regions were double stranded, sequenced with an alternate 
chemistry, or covered by high quality data (i.e., phred quality >= 
30); an attempt was made to resolve all sequencing problems, such 
as compressions and repeats; all regions were covered by sequence 
from more than one subclone; and the assembly was confirmed by 
restriction digest. 

MAPPING INFORMATION: 

Mapping information for this clone was provided by Dr. John D. 
McPherson, Department of Genetics, Washington University, St. Louis 
MO. For additional information about the map position of this 
sequence, see http://genome.wustl.edu/gsc 

SOURCE INFORMATION: 

The RPCI-11 human BAC library was made from the blood of one male 
donor, as described by Osoegawa, K . , Woon, P . Y . , Zhao,B., Frengen,E., 
Tateno,M., Catanese, J. J. and de Jong, P.J. (1998) An improved 
approach for construction of bacterial artificial chromosome 
libraries. Genomics 51:1-8. The clone may be obtained either from 
Research Genetics, Inc. (http://www.resgen.com) or Pieter de Jong 
and coworkers at http://www.chori.org 



VECTOR: pBACe3.6 



FEATURES 

source 



repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeater egion 
misc_f eature 
mis cofeature 
mis c_f eature 

repeater egion 
repeat_r egion 
repeat_r egion 
repeat__r egion 
repeat_r egion 
misc_f eature 
misc_f eature 

repeat_region 
repeat_r egion 
mis c_f eature 

misc feature 



NEIGHBORING SEQUENCE INFORMATION: 

The clone sequenced to the left is RP11-64A1, 2000 bp overlap; the 
clone sequenced to the right is RP11-397E7, 2000 bp overlap. 
Actual start of this clone is at base position 1995 of RP11-64A1; 
actual end is at base position 108789 of RP11-397E7. 

Location/Qualifiers 

1. .23618 

/organism="Homo sapiens" 
/mol_type=" genomic DNA" 
/db_xref="taxon: 9606" 
/chromosome="4" 
/map="4" 

/clone= ,, RPll-711J3" 
/ clone_lib- " RPCI- 1 1 " 
595. .720 
/rpt_family="Alu" 
789. .1014 
/rpt_family= ,, Alu" 
1039. .1072 
/rpt_f amily=" (TAAA) n" 
1734. .1936 
/rpt_family="MIR" 
2029. .2175 
/rpt_family="Ll" 
2339. .2706 
/note="similar to 
2484. .2656 

/note="match to EST BE181226 
2499. .2656 

/note="similar to Mus musculus EST BB613812 
(NID:gl6454310) " 
2740. .2892 
/rpt_family="MIR" 
3617. .3920 
/rpt_family="Alu" 
4602. .4890 
/rpt_family="Alu" 
4892. .4914 
/ rp t_f ami ly= " AT_r i ch " 
5459. .5624 
/rpt_family="Alu" 
6280. .6368 

/note-"match to EST BE181226 (NID : g86604 02 ) " 
6280. .6368 

/note="similar to Mus musculus EST BB613812 
(NID:gl6454310) " 
6811. .7098 
/rpt_family="Alu" 
7515. .7694 
/rpt_family="MIR" 
7794. .7912 

/note="similar to Mus musculus EST BB613812 
(NID:gl6454310) " 
7794. .7885 

/note="match to EST BE181226 (NID : g86604 02 ) " 



Sus scrofa EST BE031975 (NID : g832 6984 ) " 
(NID:g8660402) " 



repeat_ 

repeat_ 

repeat_ 

repeat^ 

repeat 

repeat_ 

repeat_ 

repeat_ 

repeat_ 

repeat^ 

repeat_ 

repeat_ 

repeat_ 

repeat_ 

repeat_ 

repeat_ 

repeat_ 

repeat_ 

repeat_ 

repeat_ 

repeat_ 

repeat_ 

repeat_ 

repeat_ 

repeat_ 

repeat_ 

repeat_ 

repeat 

repeat_ 



region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
region 
region 



8198. .8513 

/rpt_family="Ll" 

8514. .8910 

/rpt_family="MaLR n 

8911. .9173 

/rpt_family="Ll" 

9184. .9592 

/rpt_family="MaLR" 

9873. .10183 

/ rpt_family= "Alu" 

10157. .10266 

/rpt_f ami ly="GA- rich" 

10822. .11165 

/ rp t_f amily= "MaLR" 

11242. .11261 
/rpt_family=" (TTTTG) n" 

11243. .11530 

/ r p t_ f ami 1 y = " Al u " 
11690. .11833 
/rpt_family="GA--rich" 
11870. .11929 
/rpt_family="Ll" 
11916. .11938 
/rpt_f amily= T, AT_rich" 
12632. .12696 
/rpt_family="Ll" 
12842. .12935 
/rpt_family="L2" 
13129. .13424 
/rpt_family="Alu" 
13401. .13424 
/rpt_f amily=" (A) n" 
13607. .13641 
/rpt_family="Ll" 
13642. .13952 
/rpt_family="Alu" 
13953. .14359 
/rpt_family="Ll n 
14392. .14416 
/rpt_family=" (T)n" 
14562. .14588 
/ rp t_f ami 1 y= " AT__r i ch " 
15001. .15392 
/rpt_family="Ll" 
16436. .16604 
/rpt_family="MIR M 
16895. .17214 
/rpt_family="Alu" 
17186. .17231 
/rpt_family=" (GAAAA) n" 
17275. .17373 
/rpt_family=" (TTTC) n" 
17344. .17657 
/rpt_family="Alu" 
17713. .17824 
/rpt_family="L2 ,, 
18000. .18165 



/rpt_family="Ll" 
repeat_region 18255. .18384 

/rpt_fanuly="MIR" 
repeat_region 18385. .18693 

/rpt_family="Alu" 
repeat_region 18666. .18695 

/ r p t_f ami 1 y= "AT_r i ch " 
repeat_region 18694. .18807 

/ rp t_f ami 1 y= "MIR" 
repeat_region 18881. .19188 

/rpt_family="MaLR" 
repeat_region 19189. .19562 

/rpt_family="MaLR" 

Query Match 33.2%; Score 377; DB 9; Length 23618; 

Best Local Similarity 100.0%; Pred. No. 1.2e-103; 

Matches 377; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 
Qy 1 AT GAGAGCCAATT GT T C CAGCAGCTCAGCCT GCCCT GCCAACAGTT CAGAGGAGGAGCT G 60 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ri 

Db 23603 AT GAGAGCCAAT T GTT CCAGCAGCT CAGC CT GC CCT GCCAACAGTT CAGAGGAGGAGCT G 23544 

Qy 61 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 23543 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 23484 

Qy 121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 23483 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 23424 

Qy 181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 24 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 23423 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 23364 

Qy 241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 23363 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 23304 

Qy 301 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 360 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I > 

Db 23303 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 23244 

Qy 361 G GAGAT AT GGAT C T C AG 377 

I I I I I I I II I I I I I I I I 

Db 23243 GGAGAT AT G GAT C T C AG 23227 



RESULT 7 

AC093827/c 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



AC093827 192263 bp DNA linear PRI 01-MAR-2002 

Homo sapiens BAC clone RP11-397E7 from 4, complete sequence. 
AC093827 AC016973 
AC093827.3 GI: 16328304 
HTG. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
MEDLINE 
PUBMED 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 192263) 
Sulston,J.E. and Waterston, R. 

Toward a complete human genome sequence 
Genome Res. 8 (11), 1097-1108 (1998) 
99063792 
9847074 

2 (bases 1 to 192263) 
Goyea,E., Meyer, R. and Dixon, R. 

The sequence of Homo sapiens BAC clone RP11-397E7 
Unpublished (2001) 

3 (bases 1 to 192263) 
Waterston, R. H . 
Direct Submission 

Submitted ( 10-SEP-2001 ) Genome Sequencing Center, Washington 
University School of Medicine, 4444 Forest Park Parkway, St. Louis, 
MO 63108, USA 

4 (bases 1 to 192263) 
Waterston, R. H. 
Direct Submission 

Submitted (23-OCT-2001) Genome Sequencing Center, Washington 
University School of Medicine, 4444 Forest Park Parkway, St. Louis, 
MO 63108, USA 

5 (bases 1 to 192263) 
Waterston, R. 

Direct Submission 

Submitted ( 01-MAR-2002 ) Department of Genetics, Washington 
University, 4444 Forest Park Avenue, St. Louis, Missouri 63108, USA 
On Oct 23, 2001 this sequence version replaced gi: 15809171. 
Genome Center 

Center: Washington University Genome Sequencing Center 

Center code: WUGSC 

Web site: http://genome.wustl.edu/gsc 
Contact: sapiens@watson . wustl . edu 

Summary Statistics 

Center project name: H_NH0397E07 
Drafting Center: WIBR 



NOTICE: This sequence may not represent the entire insert of this 
clone. It may be shorter because we only sequence overlapping 
clone sections once, or longer because we provide a small overlap 
between neighboring data submissions. 

This sequence was finished as follows unless otherwise noted: 
all regions were double stranded, sequenced with an alternate 
chemistry, or covered by high quality data (i.e., phred quality >= 
30); an attempt was made to resolve all sequencing problems, such 
as compressions and repeats; all regions were covered by sequence 
from more than one subclone; and the assembly was confirmed by 
restriction digest. 



MAPPING INFORMATION: 

Mapping information for this clone was provided by Dr. John D. 
McPherson, Department of Genetics, Washington University, St. Louis 
MO. For additional information about the map position of this 
sequence, see http://genome.wustl.edu/gsc 



SOURCE INFORMATION: 

The RPCI-11 human BAC library was made from the blood of one male 
donor, as described by Osoegawa,K., Woon,P.Y., Zhao,B., Frengen,E., 
Tateno,M., Catanese, J. J. and de Jong, P.J. (1998) An improved 
approach for construction of bacterial artificial chromosome 
libraries. Genomics 51:1-8. The clone may be obtained either from 
Research Genetics, Inc. (http://www.resgen.com) or Pieter de Jong 
and coworkers at http://www.chori.org 
VECTOR: pBACe3 . 6 

NEIGHBORING SEQUENCE INFORMATION: 

The clone sequenced to the left is RP11-711J3; the clone sequenced 
to the right is RP11-168E22. Actual start of this clone is at base 
position 1 of RP11-397E7; actual end is at base position 192263 of 
RP11-397E7. 



FEATURES 

source 



Data from AC079237 and AC093779 was used to finish this clone, 
AC093827. Polymorphisms have been identified between AC079237 and 
AC093827 . 

The sequence of AC016973 has been incorporated into AC093827. 
Location/Qualifiers 
1. .192263 

/organism="Homo sapiens" 
/mol_type=" genomic DNA" 
/db_xref="taxon: 9606" 
/chromosome="4" 
/map="4" 

/clone="RPll-397E7" 
/clone_lib="RPCI-ll" 
1. .194 

/rpt_family="MIR" 
1553. .1820 

/note="similar to EST BE151388 (NID : g8614109 ) 
3422. .3792 
/rpt_family="MaLR" 
3917. .3942 
/rpt_family=" (T)n" 
4460. .4772 
/rpt_family="Alu" 
4750. .4772 
/rpt_f amily-" (A) n" 
6150. .6469 
/rpt_family="MaLR" 
6470. .6781 
/rpt_family="Alu" 
6754. .6795 

/rpt_f amily=" (GAAAA) n" 
6823. .7127 
/rpt_family= ,, Alu" 
7245. .7324 
/rpt_family="Ll" 
7386. .7651 
/rpt_family="Alu" 
7721. .7831 
/rpt_family-"Ll" 



repeat region 
misc_f eature 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 



repeat_region 

repeat_region 

repeat_region 

repeat_region 

repeat_region 

repeat__region 

repeat__region 

repeat_region 

repeat_region 

repeat_region 

repeat_region 

misc_f eature 

repeat_region 

repeat_region 

repeat_region 

repeat_region 

repeat_region 

repeat_region 

repeat_region 

repeat_region 

repeat_region 

repeat region 

repeat_region 

repeat_region 

repeat_region 

misc_f eature 

misc_f eature 

mi sc_f eature 

misc feature 



7853. .8154 

/rpt_family="Alu" 

7967. .7987 

/ rp t_f ami 1 y= "AT_ri ch " 

8134. .8165 

/ rp t_f ami 1 y= " AT_r i ch " 

8269. .8410 

/rpt_family="Ll" 

9108. .9235 

/ rp t_f ami ly= "MI R" 

9966. .10007 

/rpt_family="MIR M 

10729. .11025 

/rpt_family="Alu" 

11003. .11025 

/rpt_family=" (A) n" 

11207. .11818 

/rpt_family="Ll" 

11808. .11850 

/ rp t_f amily= " AT_r i ch " 

11851. .12151 

/rpt_family="Alu" 

12109. .12705 

/note="similar to EST BG619594 (NID : gl367 0965 ) ' 

12132. .12151 

/rpt_family=" (A) n" 

12788. .12884 

/rpt_family= M Alu" 

13357. .13649 

/rpt_family="Alu" 

13735. .14280 

/rpt_family="Ll" 

14376. .14499 

/ rp t_f ami 1 y= "Al u " 

14500. .15875 

/rpt_family="Ll" 

15857. .15880 

/rpt_family="AT_rich" 

15876. .16047 

/rpt_family= ,, Alu" 

16086. .16231 

/rpt_family="Alu" 

16254. .16690 

/rpt_family="L2" 

21991. .22029 

/ r p t JE ami 1 y= " AT_r i ch " 

22069. .22413 

/rpt_family="L2" 

22460. .22610 

/rpt_family= ,, Alu" 

23018. .23063 

/note="similar to EST BI522604 (NID: gl5347396) ' 
23034. .23652 

/note="similar to EST BG707405 (NID : gl398372 1 ) 
23040. .23737 

/note="similar to EST BG612893 (NID : gl3664264 ) 
23042. .23907 



misc_feature 
misc feature 
misc_f eature 
mis cofeature 
misc_f eature 
misc_feature 
misc_feature 
misc_f eature 
mi sc_f eature 
misc feature 



/note="similar to EST BI549585 (NID : gl5436897 ) " 
23042. .23852 

/note="similar to EST BG779925 (NID : gl4050242 ) " 
23042. .23791 

/note="similar to EST BG401954 (NID : gl32 95402 ) " 
23042. .23696 

/note="similar to EST BG720731 (NID : gl3999918 ) " 

23042. .23491 

/note-"similar to EST BG530513 (NID: gl3522050) " 

23043. .23763 

/note="similar to EST BF700412 (NID: gll985820 ) " 

23043. .23273 

/note="similar to EST BG771989 (NID : gl4 082642 ) " 

23044. .23864 

/note="similar to EST BG528480 (NID: gl3520017 ) " 
23053. .23874 

/note="similar to EST BI838666 (NID : gl5950216 ) " 

23055. .23873 

/note="similar to EST BG822780 (NID : gl4 170367 ) " 

23056. .23907 

/note="similar to EST BG680207 (NID : g!3911604 ) " 



Query Match 33.2%; Score 377; DB 9; Length 192263; 

Best Local Similarity 100.0%; Pred. No. 1.6e-103; 

Matches 377; Conservative 0; Mismatches 0; Indels 0; 



Gaps 



0; 



Qy 

Db 

Qy 

Db 



1 AT GAGAGCC AAT T GT T C C AGCAGCT CAGC CT GCC CT GC CAACAGT T CAGAGGAGGAGCT G 60 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 

1985 AT GAGAGC C AAT TGTTCC AGCAGCT CAGCCTGCCCT GCCAACAGTT CAGAGGAGGAGCT G 1926 

61 CCAGT GGGACT GGAGGT GCAT GGAAAC CT GGAGCT CGTTTT CACAGT GGT GT CCACT GT G 12 0 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

1925 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 18 66 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 18 0 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

1865 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 1806 



181 



240 



CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1805 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 17 46 



241 



300 



TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
1745 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 1686 

301 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

1685 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 1626 

361 GGAGAT AT GGAT CT CAG 377 

I I I I I I I I I I I I I I I I I 

1625 GGAGAT AT G GAT CT CAG 1609 



RESULT 8 
AC099847 



LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 



TITLE 
JOURNAL 

COMMENT 



AC099847 65268 bp DNA linear HTG 22-NOV-2001 

Homo sapiens chromosome 18 clone RP11-819K4 map 18, LOW-PASS 
SEQUENCE SAMPLING. 
AC099847 

AC099847.1 GI: 17 047210 
HTG; HTGS_PHASE0 . 
Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 65268) 

Birren,B., Linton, L., Nusbaum, C. and Lander, E. 
Homo sapiens chromosome 18, clone RP11-819K4 
Unpublished 

2 (bases 1 to 65268) 

Birren,B., Linton, L., Nusbaum,C, Lander, E., Ali , A. , Allen, N., 
Anderson, S., Barna,N., Bastien,V., Boguslavkiy, L . , Boukhgalter , B . , 
Brown, A., Camarata,J., Campopiano, A. , Chang, J., Chazaro,B., 
Choepel,Y., Colangelo,M. , Collins, S., Collymore,A. , Cook, A., 
Cooke, P., DeArellano, K. , Dewar,K., Diaz, J. S . , Dodge, S., Faro,S., 
Ferreira,P., FitzHugh,W., Gage,D., Galagan,J., Gardyria,S., 
Ginde,S., Gord,S., Goyette,M., Graham, L., Grand-Pierre, N . , 
Hagos,B., Heaford,A. , Horton,L., Hulme,W., Iliev, I., Johnson, R. , 
Jones, C, Kamat,A., Karatas,A., Kells,C, LaRocque,K., 
Lamazares, R. , Landers, T., Lehoczky,J., Levine,R., Liu, G . , 
MacLean,C, Macdonald, P . , Major, J., Marquis, N., Matthews, C, 
McCarthy, M., McEwan,P., McKernan,K., McPheeters, R. , Meldrim, J., 
Meneus,L., Mihova,T., Mlenga,V., Murphy, T . , Naylor,J., Nguyen, C, 
Norbu,C, Norman, C.H., 0'Connor,T., O 1 Donnell, P . , 0'Neil,D., 
Oliver, J., Peterson, K., Phunkhang, P . , Pierre, N., Pollara,V., 
Raymond, C, Retta,R., Rieback,M., Riley, R. , Rise,C, Rogov, P., 
Roman, J., Rosetti,M., Roy, A. , Santos, R. , Schauer,S., Schupback, R. , 
Seaman, S., Severy,P., Spencer, B., Stange-Thomann, N . , Sto j anovic, N. , 
Strauss, N., Subramanian, A. , Talamas,J., Tesfaye,S., Theodore, J., 
Topham, K., Travers,M., Travis, N., Trigilio,J., Vassiliev, H . , 
Viel,R., Vo,A., Wilson, B . , Wu,X., Wyman,D., Ye,W.J., Young, G., 
Zainoun,J., Zembek,L., Zimmer,A. and Zody,M. 
Direct Submission 

Submitted (22-NOV-2001 ) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 
All repeats were identified using RepeatMasker : 
Smit, A.F.A. & Green, P. (1996-1997) 

http : //ftp . genome .Washington. edu/RM/ RepeatMasker . html 
Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact : sequence_submissions@genome . wi .mit . edu 

Project Information 

Center project name: L13211 
Center clone name: 819 K 4 



NOTE: This record contains 81 individual 
sequencing reads that have not been assembled into 
contigs. Runs of N are used to separate the reads 
and the order in which they appear is completely 
arbitrary. Low-pass sequence sampling is useful for 



identifying clones that may be gene-rich and allows 
overlap relationships among clones to be deduced. 
However, it should not be assumed that this clone 
will be sequenced to completion. In the event that 
the record is updated, the accession number will 
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Query Match 33.1%; 
Best Local Similarity 99.7%; 
Matches 376; Conservative 



Score 375.4; DB 2; 
Pred. No. 4.3e-103; 
0; Mismatches 1; 



Length 65268; 
Indels 0; Gaps 



0; 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1 AT GAGAGC CAAT T GT T C C AGC AGCT CAGC CT GCC CT GC CAACAGT T C AGAGGAGGAGCT G 60 
I I I I I I I II I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I 
50030 AT GAGAG C C CAT T GT T C C AGCAG CT CAG C CT GC C CT GC CAACAGT T CAGAGGAGGAGCT G 50089 



61 



120 



CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

50090 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 50149 

121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

50150 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 50209 



181 



240 



CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

50210 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 502 69 



241 



300 



TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

50270 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 50329 



301 



360 



CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II II 

50330 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 50389 



Qy 361 G GAG AT AT G GAT C T C AG 377 

I I I I I I I I I I I I I I I I i 
Db 50390 GGAGAT AT GGAT CT CAG 50406 
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ORIGIN 



AR033870 
Sequence 1 
AR033870 
AR033870.1 



2263 bp 
from patent US 5869265. 

GI:5949475 



DNA 



linear PAT 29-SEP-1999 



Unknown . 

Unknown. 

Unclassified. 

1 (bases 1 to 2263) 

Dawson, P . A. 

Ileal bile acid transporter compositions and methods 
Patent: US 5869265-A 1 09-FEB-1999; 

Location/Qualifiers 

1. .2263 

/organism= "unknown" 
/mol_type-"unas signed DNA" 



Query Match 28.3%; 
Best Local Similarity 60.8%; 
Matches 522; Conservative 



Score 320.4; DB 6; 
Pred. No. 2.1e-86; 
0; Mismatches 336; 



Length 2263; 



Indels 



0; Gaps 



0; 



Qy 



Db 



8 0 ATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGT 139 

II I I I I III II III Ml I I II I I I I I I I 

18 8 ACGCCATCCTCAGCGTGGTGATGAGCACCGTGCTCACAATCCTCCTAGCCTTGGTGATGT 2 47 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



140 



248 



200 



308 



260 



368 



TCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGG 
III I I I I I I I I II I III I I I I I I I I I I I I I I I I I I I 

TTTCCATGGGGTGCAATGTGGAACTCCACAAGTTTCTGGGACACCTAAGGCGGCCATGGG 

GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG 
I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

GCATCGTCGTGGGCTTCCTCTGTCAGTTTGGAATCATGCCTCTCACAGGTTTCGTCCTGT 



199 



307 



259 



367 



319 



CCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 

III I I I I I I I I II I I I I II I I II I I I I I I I I I I II I I I 

CCGTGGCCTTTGGCATCCTCCCAGTGCAAGCTGTGGTGGTGCTGATCCAGGGTTGCTGCC 427 



320 CGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGATGGAGATATGGATCTCAGCA 37 9 
I I I I I I I II I II II I III I I I I I I I I I I II I I I I I I I I I II 

42 8 CTGGAGGAACTGCCT CCAATATCCTAGCCTATTGGGTAGAT GGCGACAT GGACCT CAGCG 487 

38 0 TCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATC 439 

I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I 
48 8 TTAGCATGACCACCTGCTCCACGCTGCTTGCCCTTGGAATGATGCCCCTTTGCCTCTTCA 547 



Qy 

Db 



440 
548 



499 



607 



Qy 500 TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 559 

I I I I I I II I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I 

Db 608 CTTCTCTGGTTGCTCTTGTTATTCCTGTTTCCATTGGAATGTATGTGAATCACAAATGGC 667 

Qy 560 CAAAACAATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 619 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 

Db 668 CCCAAAAAGC AAAGATCAT ACTT AAAATT GGAT CCAT C GCAGGT GCAATT CT CATT GTT C 727 

Qy 620 TGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTC 67 9 

I I I I I I I I I I I I I I I I III I IN! M M M 

Db 72 8 T CAT C G CT GT GGT T GGAGGAAT ACT GT AC CAAAGT GC CT G GAC CAT T GAAC CCAAG CT GT 7 87 

Qy 680 TGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 739 

II II I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 788 GGATTATAGGAACCATATATCCTATAGCTGGCTACGGCCTGGGGTTTTTCCTGGCTAGAA 84 7 

Qy 74 0 TTACCC ACCAGTCTTGGCAAAGGT GCAGGACAATTT CCTTAGAAACT GGAGCT CAGAATA 7 99 

M I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 84 8 TTGCTGGTCAACCCTGGTACAGGTGCCGAACAGTTGCCTTGGAAACCGGGTTGCAGAACA 907 

Qy 800 TT CAGAT GT GCAT CAC CAT GCT CCAGTTAT CTTT CACT GCT GAGCACTT GGT CCAGATGT 859 

I I I I I I II I I I II I I I I I I I I I I I I I I I I I I! I II III 
Db 908 CTCAGCTGTGTTCCACCATTGTGCAGCTTTCCTTCAGCCCTGAGGACCTCAACCTTGTGT 967 

Qy 860 T GAGT T T C C C ACT GGC CT AT G GACT CT T C C AG CT GAT AGAT G GAT T T CT TAT T GT T GCAG 919 

II I I I I I I I III I I I I II I I I I III I I I I I I 

Db 968 T CACCT T C C CC CT CAT CT ACAGCAT CT T C CAGAT C G C CT T T GC AGCAAT ACT AT TAG GAG 1027 

Qy 920 CAT AT CAGACGT ACAAGA 937 

I I I I I I I I I I I I 

Db 1028 C TT AT GT C G C AT AC AAGA 1045 



RESULT 10 

132744 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

FEATURES 

source 



ORIGIN 



132744 
Sequence 
132744 
132744. 1 



2263 bp 
1 from patent US 5589358. 

GI:1823535 



DNA 



linear PAT 06-FEB-1997 



Unknown . 

Unknown . 

Unclassified. 

1 (bases 1 to 2263) 

Dawson, P .A. 

Ileal bile acid transporter compositions and methods 
Patent: US 5589358-A 1 31-DEC-1996; 

Location/Qualifiers , 

1. .2263 

/organism= "unknown " 
/mol_type="unassigned DNA" 



Query Match 28.3%; Score 320.4; DB 6; 

Best Local Similarity 60.8%; Pred. No. 2.1e-86; 
Matches 522; Conservative 0; Mismatches 336; 



Length 2263; 
Indels 0; Gaps 



0; 



80 ATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGT 139 

|| | | | | III II III III I I M I I I I I I I 

188 ACGCCATCCTCAGCGTGGTGATGAGCACCGTGCTCACAATCCTCCTAGCCTTGGTGATGT 2 47 

140 TCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGG 199 

Ml I I I I I I I I I I I III II I I I I I I I II I I I I I I I I 

248 TTTCCATGGGGTGCAATGTGGAACTCCACAAGTTTCTGGGACACCTAAGGCGGCCATGGG 307 

2 00 GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG 259 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 

308 GCATCGTCGTGGGCTTCCTCTGTCAGTTTGGAATCATGCCTCTCACAGGTTTCGTCCTGT 367 

260 CCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 319 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

368 CCGTGGCCTTTGGCATCCTCCCAGTGCAAGCTGTGGTGGTGCTGATCCAGGGTTGCTGCC 427 

320 CGGGGGGCAC CAT CT CT AACAT TTT CACCTT CT GGGTT GAT GGAGAT AT GGAT CT CAGCA 379 

I I I M I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I 
428 CTGGAGGAACTGCCTCCAATATCCTAGCCTATTGGGTAGATGGCGACATGGACCTCAGCG 4 87 

3 80 TCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATC 439 

I II I I I I I I I I I I I I I I I M I I I I I I I I I I I II I I I I I I I I I I 

4 88 TTAGCATGACCACCTGCTCCACGCTGCTTGCCCTTGGAATGATGCCCCTTTGCCTCTTCA 547 

4 40 T C T AC AC CTGGTCCTG G AGT C T T C AG C AGAAT C T C AC CAT T C CT TAT C AG AACAT AG G AA 499 

I I I I I I I I III I II I I I I I I I I I I I I I 

548 T CT AT AC CAAGAT GT G GGTT GACT C AGGGAC GAT T GT GAT T CCTT AT GACAGCATT GGC A 607 

500 TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 559 

I I || II II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

608 CTTCTCTGGTTGCTCTTGTTATTCCTGTTTCCATTGGAATGTATGTGAATCACAAATGGC 667 

560 CAAAACAATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 619 

I I I I I I I I I I I I I I I I I I I I I I M I 

668 CC CAAAAAGCAAAGAT CAT ACT TAAAAT T GGAT C CAT C GCAGGT GC AATT CT C ATT GT T C 727 

620 TGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTC 679 

I MM II I III I I III I I I I I I I I M II M 

728 TCATCGCTGTGGTTGGAGGAATACTGTACCAAAGTGCCTGGACCATTGAACCCAAGCTGT 787 

680 TGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 739 

II || I I I I I I I I I I Ml I I I I I I I I I I II I I I 

788 GGATTATAGGAACCATATATCCTATAGCTGGCTACGGCCTGGGGTTTTTCCTGGCTAGAA 847 

740 TT ACCCACCAGT CTTGGCAAAGGT GCAGGACAATTT CCTTAGAAACT GGAGCT CAGAATA 799 

Ml II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I 

84 8 TTGCTGGTCAACCCTGGTACAGGTGCCGAACAGTTGCCTTGGAAACCGGGTTGCAGAACA 907 

800 TTCAGATGTGCATCACCATGCTCCAGTTATCTTTCACTGCTGAGCACTTGGTCCAGATGT 859 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II M I 

908 CTCAGCTGTGTTCCACCATTGTGCAGCTTTCCTTCAGCCCTGAGGACCTCAACCTTGTGT 967 

860 TGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGATGGATTTCTTATTGTTGCAG 919 

II I I I I I I I III I I I I I II I I I III I I I I I I 

968 TCACCTTCCCCCTCATCTACAGCATCTTCCAGATCGCCTTTGCAGCAATACTATTAGGAG 1027 



Qy 920 CAT AT C AGAC GT AC AAGA 937 

I I I I I I II I I I I 

Db 1028 CTTAT GT CGCATACAAGA 1045 



RESULT 11 

CGU02028 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



CDS 



mat peptide 



CGU02028 2263 bp mRNA linear ROD 06-JUN-1994 

Cricetulus griseus Na+ dependent ileal bile acid transporter mRNA, 
complete cds . 
U02028 

U02028.1 GI:455032 

Cricetulus griseus (Chinese hamster) 
Cricetulus griseus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Cricetinae; 
Cricetulus . 

1 (bases 1 to 2263) 

Wong,M.H., Oelkers , P . , Craddock, A. L. and Dawson, P. A. 

Expression cloning and characterization of the hamster ileal 

sodium-dependent bile acid transporter 

J. Biol. Chem. 269 (2), 1340-1347 (1994) 

94117449 

8288599 

2 (bases 1 to 2263) 
Dawson, P . A. 

Direct Submission 

Submitted (22-SEP-1993) Paul A. Dawson, Dept Medicine/Section 
Gastroenterology, Bowman Gray School of Medicine, Wake Forest 
University, Medical Center Boulevard, Winston-Salem, NC, 27517, USA 
Location/Qualifiers 
1. .2263 

/organism="Cricetulus griseus" 
/mol_type="mRNA" 
/db_xref="taxon: 10029" 
/clone="clone pIBAT (44-1) " 
/tissue__type="ileum" 

/clone__lib="hamster ileal cDNA expression library" 
/note="author cites additional common name: golden Syrian 
hamster" 
109. .1155 
/codon_start=l 

/product="Na+ dependent ileal bile acid transporter" 
/protein_id="AAA18640. 1" 
/db_xref="GI : 455033" 

/translation="MDNSSICNPNATICEGDSCIAPESNFNAILSWMSTVLTILLAL 
VMFSMGCNVELHKFLGHLRRPWGIWGFLCQFGIMPLTGFVLSVAFGILPVQAVWLI 
QGCCPGGTASNILAYWVDGDMDLSVSMTTCSTLLALGMMPLCLFIYTKMWVDSGTIVI 
P YDS I GT S LVALVI P VS I GMYVNHKWPQKAKI I LKI GS I AGAI LI VLI AWGGI L YQS 
AWTIEPKLWIIGTIYPIAGYGLGFFLARIAGQPWYRCRTVALETGLQNTQLCSTIVQL 
SFSPEDLNLVFTFPLIYSIFQIAFAAILLGAYVAYKKCHGKNNTELQEKTDNEMEPRS 
SFQETNKGFQPDEK" 
109. .1152 

/product="Na+ dependent ileal bile acid transporter" 



ORIGIN 



Query Match 28.3%; Score 320.4; DB 10; Length 2263; 

Best Local Similarity 60.8%; Pred. No. 2.1e-86; 

Matches 522; Conservative 0; Mismatches 336; Indels 0; Gaps 0; 

Qy 80 ATGGAAAC CT GGAGCT C GTT T T CACAGT GGT GT C CACT GT GAT GAT GGGGCT GCT CAT GT 139 

I I I I I I III II I I I I I I I I I I I I I I I II 

Db 188 ACGC CAT C CT C AG C GT GGT GAT GAG CAC C GT GCT CAC AATC CT CCTAGC CT TGGT GAT GT 247 

Qy 140 T CT CT T T GGGAT GT T C C GT GGAGAT C C GGAAGCT GT G GT C GC AC AT C AG GAGAC C C T GGG 199 

III I I I I I I I I I I I III I I I I I I II I I I I I I I I I I I 

Db 248 TTTCCATGGGGTGCAATGTGGAACTCCACAAGTTTCTGGGACACCTAAGGCGGCCATGGG 307 

Qy 2 00 GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG 2 59 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 308 GCATCGTCGTGGGCTTCCTCTGTCAGTTTGGAATCATGCCTCTCACAGGTTTCGTCCTGT 367 

Qy 2 60 CCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 319 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 368 CCGTGGCCTTTGGCATCCTCCCAGTGCAAGCTGTGGTGGTGCTGATCCAGGGTTGCTGCC 427 

Qy 32 0 CGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGATGGAGATATGGATCTCAGCA 379 

I I I I I I I I i I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I 
Db 428 CT GGAG GAACT GC CT CCAAT AT CCTAG C CT AT T GGGT AGAT GGC GAC AT GGACC T CAGC G 4 87 

Qy 380 T CAGT AT GACAAC CT GT T C C AC CGT GGC CG C CCT GGGAAT GAT GC CACT CT G CAT TT AT C 4 39 

I I I I I I I I I I I I I I I I I I II I I I I I II I I II I I I I I I I I I I I I 
Db 4 88 TTAGCATGACCACCTGCTCCACGCTGCTTGCCCTTGGAATGATGCCCCTTTGCCTCTTCA 547 

Qy 440 T CT ACAC CT G GT C CT GGAGT CTT C AGC AGAAT CT CACCATT C CTT AT C AGAACAT AG GAA 4 99 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 

Db 54 8 T CT ATAC CAAGAT GTGGGTT GACT CAGGGACGATT GT GATT CCTTAT GACAGCATTGGCA 607 

Qy 500 TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 559 

I I II II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 608 CTTCTCTGGTTGCTCTTGTTATTCCTGTTTCCATTGGAATGTATGTGAATCACAAATGGC 667 

Qy 560 CAAAACAATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 619 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 668 C C C AAAAAG C AAAGAT CAT AC T T AAAAT T G GAT C CAT C GC AGGT G C AAT T C T CAT T GT T C 727 

Qy 620 TGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTC 67 9 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 72 8 TCATCGCTGTGGTTGGAGGAATACTGTACCAAAGTGCCTGGACCATTGAACCCAAGCTGT 787 

Qy 68 0 TGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 739 

II II I I I I I I I I I I I I I I I I I I I I I II I I I I I I 

Db 788 GGATTATAGGAACCATATATCCTATAGCTGGCTACGGCCTGGGGTTTTTCCTGGCTAGAA 847 

Qy 74 0 T T AC C CAC CAGT CTT G GCAAAGGT G CAGGAC AAT T T C CT T AGAAACT GGAG CT C AGAAT A 799 

III II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 84 8 TTGCTGGTCAACCCTGGTACAGGTGCCGAACAGTTGCCTTGGAAACCGGGTTGCAGAACA 907 

Qy 800 TTCAGATGTGCATCACCATGCTCCAGTTATCTTTCACTGCTGAGCACTTGGTCCAGATGT 859 

I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I II II I 

Db 908 CTCAGCTGTGTTCCACCATTGTGCAGCTTTCCTTCAGCCCTGAGGACCTCAACCTTGTGT 967 



Qy 



860 TGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGATGGATTTCTTATTGTTGCAG 919 



II 1 1 1 II 1 1 III I 1 1 1 1 1 1 II I III I I I I 1 1 

Db 968 TCACCTTCCCCCTCATCTACAGCATCTTCCAGATCGCCTTTGCAGCAATACTATTAGGAG 1027 

Qy 920 CAT AT C AGAC GT AC AAGA 937 

I I I I I I I I I I I I 

Db 1028 CTTATGTCGCATACAAGA 1045 



RESULT 12 

BC053189 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REMARK 
COMMENT 



Vertebrata; Euteleostomi; 
Ostariophysi; 



BC053189 1916 bp mRNA linear VRT 07-OCT-2003 

Danio rerio cDNA clone MGC: 63998 IMAGE : 6792624 , complete cds . 
BC053189 

BC05318 9. 1 GI: 31418837 
MGC. 

Danio rerio (zebrafish) 
Danio rerio 

Eukaryota; Metazoa; Chordata; Craniata; 
Actinopterygii ; Neopterygii ; Teleos tei ; 
Cyprinif ormes ; Cyprinidae; Danio. 

1 (bases 1 to 1916) 

Strausberg, R. L . , Feingold, E. A. , Grouse, L.H., Derge,J.G., 
Klausner , R. D . , Collins , F. S . , Wagner, L., Shenmen, C .M. , Schuler , G . D . , 
Altschul, S. F. , Zeeberg,B., Buetow,K.H., Schaef er, C. F. , Bhat,N.K., 
Hopkins, R. F. , Jordan, H., Moore, T., Max,S.I., Wang, J., Hsieh,F., 
Diatchenko, L. , Marusina,K., Farmer, A. A. , Rubin, G.M., Hong,L., 
Stapleton,M. , Soares,M.B., Bonaldo,M. F. , Casavant, T . L . , 
Scheetz , T . E . , Brownstein,M. J. , Usdin, T. B. , Toshiyuki, S . , 
Carninci,P., Prange,C, Raha,S.S., Loquellano, N . A. , Peters, G. J., 
Abramson, R. D. , Mullahy, S.J. , Bosak,S.A. , McEwan, P.J. , 
McKernan, K. J. , Malek,J.A., Gunaratne , P . H . , Richards, S., 
Worley,K.C, Hale,S., Garcia, A. M. , Gay,L.J., Hulyk,S.W., 
Villalon, D. K. , Muzny, D .M. , Sodergren, E . J. , Lu,X., Gibbs,R.A. , 
Fahey,J., Helton, E., Ketteman,M., Madan,A. , Rodrigues , S . , 
Sanchez, A., Whiting, M., Madan,A., Young, A. C, Shevchenko, Y. , 
Bouf f ard, G. G. , Blakesley, R. W . , Touchman, J.W. , Green, E.D., 
Dickson, M. C . , Rodriguez, A. C. , Grimwood,J., Schmutz,J., Myers, R.M., 
Butterf ield, Y. S . , Krzywinski ,M. I . , Skalska,U., Smailus , D . E . , 
Schnerch,A., Schein,J.E., Jones, S.J. and Marra,M.A. 
Generation and initial analysis of more than 15,000 full-length 
human and mouse cDNA sequences 

Proc. Natl. Acad. Sci. U.S.A. 99 (26), 16899-16903 (2002) 

22388257 

12477932 

2 (bases 1 to 1916) 
Strausberg, R. 
Direct Submission 

Submitted ( 02- JUN-2003 ) National Institutes of Health, Mammalian 
Gene Collection (MGC) , Cancer Genomics Office, National Cancer 
Institute, 31 Center Drive, Room 11A03, Bethesda, MD 20892-2590, 
USA 

NIH-MGC Project URL: http://mgc.nci.nih.gov 

Contact: MGC help desk 

Email: cgapbs-r@mail.nih.gov 

Tissue Procurement: Leonard I. Zon, M.D. 

cDNA Library Preparation: Invitrogen Corp 

cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 



DNA Sequencing by: Sequencing Group at the Stanford Human Genome 
Center, Stanford University School of Medicine, Stanford, CA 94305 
Web site: http://www-shgc.stanford.edu 
Contact: (Dickson, Mark) mcd@paxil.stanford.edu 

Dickson, M. , Schmutz, J., Grimwood, J., Rodriquez, A., and Myers, 
R. M. 



Clone distribution: MGC clone distribution information can be found 
through the I.M.A.G.E. Consortium/LLNL at: http://image.llnl.gov 
Series: IRAK Plate: 117 Row: g Column: 20 

This clone was selected for full length sequencing because it 
passed the following selection criteria: Hexamer frequency ORF 
analysis, Similarity but not identity to protein. 
FEATURES Location/Qualif iers 

source 1. .1916 

/organism="Danio rerio" 

/mol_type="mRNA" 

/db_xref="taxon:7955" 

/ clone= "MGC : 6 3 9 9 8 IMAGE :6792624" 

/ tissue_type="Kidney, zebraf ish" 

/clone_lib="NCI_CGAP_ZKidl" 

/lab_host="DH10B" 

/note="Vector: pCMV-SP0RT6 . 1 " 
CDS 168. .1253 

/ codon_start=l 

/product^ "Unknown (protein for MGC : 63998 ) " 
/protein_id="AAH53189. 1" 
/db_xref-"GI : 31418838" 

/translation="MCTLEPVCPVNATICTGTSCLVPRDPFNDILSVVMSVAITVMLA 
MVMFSMGCT VEARKLWGHVRRPWG I FI GFLCQ FGIMP FT AFI L S LLFNVL P VQAWI I 
IMGCCPGGSSSNVFCYWLDGDMDLSISMTACSSILALGMMPLCLLIYTTIWTAGDAIQ 
IPYDNIGITLVSLLVPVGLGMLVKHKWPKAAKKILKVGSWGIVLIIVIAVIGGVLYQ 
SSWTIAPSLWIIGTIYPFIGFGLGFLLARFVGQPWHRCRTIALETGMQNAQLASTITQ 
LSFSPAELEVMFAFPLIYSIFQLWAGIAVSIHYSIKRCRHQTLVEEDGEGTTEDCDK 
HSYSLENGGFSCDENGNNQNKDKGTKL" 
mi s cofeature 285. .731 

/note="SBF; Region: Sodium Bile acid symporter family. 
This family consists of Na+/bile acid co-transporters. 
These transmembrane proteins function in the liver in the 
uptake of bile acids from portal blood plasma a process 
mediated by the co-transport of Na+. Also in the family is 
ARC3 from S. cerevisiae, a putative transmembrane protein 
involved in resistance to arsenic compounds" 
/db_xref="CDD:pfam01758" 

ORIGIN 

Query Match 27.3%; Score 309.6; DB 5; Length 1916; 

Best Local Similarity 60.4%; Pred. No. 4.2e-83; 

Matches 510; Conservative 0; Mismatches 334; Indels 0; Gaps 0; 

Qy 95 TCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGTTCTCTTTGGGATGTT 154 

III II III I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 265 TTGTGATGAGCGTTGCCATTACCGTCATGTTGGCCATGGTTATGTTTTCAATGGGCTGCA 324 



Qy 155 C C GT GGAGAT C C GGAAGCT GT GGT C G CAC AT CAGGAGAC C CT GGG GCAT T GC T GT G GGAC 214 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 325 CTGTTGAGGCTAGAAAACTGTGGGGGCACGTTCGCAGACCCTGGGGCATTTTTATAGGTT 384 



Qy 


215 


TGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCTC 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 M 1 1 M 

TCCTTTGCCAGTTTGGCATCATGCCTTTCACAGCCTTCATACTTTCATTGCTTTTCAACG 


274 


Db 


385 


444 


Qy 


275 


TGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCCCGGGGGGCACCATCT 

......... > a ■ ■ | 1 i i | l | t 1 1 L 1 1 1 1 1 1 1 1 i II II 1 1 II 

M MINIMI II 1 1 1 1 II II 1 II II 1 M 1 1 II 1 1 II II 1 1 II 

TGCTGCCAGTCCAGGCGGTGGTCATCATCATCATGGGCTGCTGCCCTGGAGGATCAAGCT 


334 


Db 


445 


504 


Qy 


335 


CT AAC ATT T T C AC CT T CT GGGT T G AT GGAGAT AT G GAT CT C AGCAT CAGT AT GACAACCT 

I**. till r I 1 I 1 II lllilill t 1 1 1 1 1 1 

MM 1 II II II Mil 1 II 1 1 1 1 1 1 1 1 1 1 1 1 II M II 1 M 1 II 1 II 1 1 1 

CT AAT GT T TT CT GCT ACT GGCTT GAT G GAGAC AT GGAC CTAAGCAT CAGCAT GAC AGC GT 


394 


Db 


505 


564 


Qy 


395 


GTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATCTCTACACCTGGTCCT 

M 1 1 1 1 1 1 1 II 1 II 1 1 1 II II 1 M 1 1 1 II 1 II II 1 M 

GTTCTTCAATTTTGGCTCTGGGAATGATGCCTCTTTGTCTGCTCATCTACACCACAATCT 


454 


Db 


565 


624 


Qy 


455 


GGAGT CT T C AGCAGAAT CT CAC CAT T C CT T AT C AG AAC AT AG GAAT T AC C CT T GT GT GC C 

MM 1 M 1 1 1 M 1 M 1 M M II H II II HI 1 

GGACT GC AGG C GAT GC GAT C CAGAT T C CTT AC GAC AAT AT T GGGAT CAC ACT GGT GAGT T 


514 


Db 


625 


684 


Qy 


515 


T GACCATT CCT GT GGCCT TT GGTGT CTAT GT GAATT ACAGAT GGCCAAAACAAT C CAAAA 

M 1 II 1 1 1 1 1 1 II II 1 1 1 M Ml 1 M II 1 M II II 1 1 

TGCTTGTGCCTGTCGGTCTTGGGATGTTAGTGAAACACAAGTGGCCTAAAGCTGCCAAAA 


574 


Db 


685 


744 


Qy 


575 


TCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTG 

M 1 II 1 M 1 1 II 1 M 1 1 M II II 1 1 1 IN 1 II 1 II II 

AGAT CCT CAAGGT T GGAT CT GT GGT GG GAAT C GT CCT CAT CAT C GT C ATT GCAGT AAT T G 


634 


Db 


745 


804 


Qy 


635 


GT GT GGT C CT G GC GAAAG GAT CTT GGAAT T C AGACAT CAC C CT T CT GAC CAT CAGT T T C A 

Ml 1 1 1 1 1 II 1 II 1 1 1 MM 1 II II II 1 

GTGGTGTGCTTTATCAGTCCTCATGGACTATTGCTCCTTCACTTTGGATCATTGGTACAA 


694 


Db 


805 


864 


Qy 


695 


TCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTT 

M II 1 1 M 1 1 1 1 1 M 1 1 M M 1 1 M 1 1 1 Ml Ml III 

TTTATCCATTTATTGGATTTGGCTTGGGGTTCCTCTTGGCACGCTTTGTGGGCCAACCTT 


754 


Db 


865 


924 


Qy 


755 


GGCAAAGGT GC AGGACAAT T TC CT T AGAAACT G GAGCT C AGAAT ATT CAGATGT GC AT C A 

M II 1 II 1 1 II II II 1 1 1 1 II 1 1 M 1 1 II II 1 1 1 II 1 1 

GGC AC AGGT GCC GCAC CAT T GCT CT AGAAACAGGC AT GC AGAACGC CCAG CT GGC AAGT A 


814 


Db 


925 


984 


Qy 


815 


CCATGCTCCAGTTATCTTTCACTGCTGAGCACTTGGTCCAGATGTTGAGTTTCCCACTGG 

■ ■I i i i i i i i i i i i iii i ii i i i i i i i i i i 
II 1 M II 1 M M 1 1 Ml 1 M Mill MM l 

CTATTACCCAGTTGTCCTTTAGCCCTGCAGAGCTTGAGGTCATGTTCGCGTTTCCCTTAA 


874 


Db 


985 


1044 


Qy 


875 


C CTAT GGACT CTT CCAGCT GAT AGATGGATT T CTTATT GTT GCAGCATAT CAGACGTACA 

Ml 1 1 1 1 II 1 II II II 1 M M II HI 1 H 

T CT AC AGT AT C T T C CAACT GGT T GT GGCT GG GATT G CAGT GT CAAT T CAT TACT CAAT C A 


934 


Db 


1045 


1104 


Qy 


935 


AGAG 938 




Db 


1105 


1 1 1 

AGCG 1108 





RESULT 13 
D87059 

LOCUS D87059 



974 bp mRNA linear ROD 07-FEB-1999 



DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



CDS 



House mouse; Musculus domes ticus mRNA for ileal Na+-dependent bile 

acid transporter, partial cds . 

D87059 

D87059.1 GI:1504059 

ileal Na-f-dependent bile acid transporter. 
Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodent ia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 974) 
Saeki,T. 

Mouse ileal Na-dependent bile acid transporter cDNA: Partial CDS 
Unpublished 

2 (bases 1 to 974) 
Saeki, T . 

Direct Submission 

Submitted ( 09-AUG-1996) Tohru Saeki, Kyoto Prefectural University, 
Department of Biological Resource Chemistry; Nakaragi, Shimogamo, 
Sakyo-ku, Kyoto, Kyoto 606, Japan (E-mail : tsaeki@dns . kpu. ac . jp, 
Tel: 81-75-703-5663, Fax:81-75-703-5661) 

Location/Qualifiers 

1. .974 

/organism="Mus musculus" 
/mol_type="mRNA" 
/strain="ICR M 
/db_xref="taxon: 10090" 
/dev_stage="8 weeks" 
<1. .>974 
/codon_start-l 

/product="ileal Na+-dependent bile acid transporter" 
/protein_id="BAA13237 . 1" 
/db_xref="GI : 1504060" 

/trans lation="PNATVCEGDSCWPESNFNAI LNTVMSTVLTILLAMVMFSMGCN 
VEVHKFLGHI KRPWGI FVGFLCQFGIMPLTGFI LSVASGI LPVQAVWLIMGCCPGGT 
GSNII^YWIDGDMDLSVSMTTCSTLLALGMMPLCLFVYTKMWVDSGTIVIPYDSIGIS 
LVALVI PVSFGMFVNHKWPQKAKIILKIGSITGVILIVLIAVI GGILYQSAWIIEPKL 
WIIGTIFPIAGYSLGFFLARLAGQPWYRCRTVALETGMQNTQLCSTIVQLSFSPEDLN 
L VFT F P L I YT VFQ L VFAAVI L G I YVT YRKC Y G KN DAE F L E KT DN EMD S " 



ORIGIN 



Query Match 27.1%; 
Best Local Similarity 59.4%; 
Matches 522; Conservative 



Score 307.8; DB 10; 
Pred. No. 1.4e-82; 
0; Mismatches 357; 



Length 974; 
Indels 0; 



Gaps 



0; 



Qy 

Db 

Qy 

Db 

Qy 

Db 



8 0 AT GGAAACCTGGAGCT CGTTTT CACAGT GGTGT CCACT GT GAT GATGGGGCTGCT CATGT 139 

M | || II I II II I II Ml I I II II I I I I I 

56 AT GCAAT T CT C AAT ACAGT GAT GAGCACT GT GCT C ACC AT CCT CT T AGC C AT GGT GAT GT 115 

14 0 TCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGG 199 

I I I I I I II I I I I I I I I M IN I I I I I I I I I I I I I I I I 

116 T T T CT AT GGGGT GCAATGT GGAAGT CCACAAGTT C CT AG GACAT AT AAAGAGAC CAT G GG 175 

2 00 GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG 259 

Ml I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

176 GTATCTTCGTGGGCTTCCTCTGTCAGTTTGGAATCATGCCTCTCACAGGCTTTATCCTGT 2 35 



Qy 



260 CCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 319 



I I III I 1 1 II 1 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 

Db 236 CTGTGGCCTCTGGCATCCTTCCTGTACAGGCTGTAGTGGTGCTAATTATGGGTTGCTGCC 295 

Qy 320 CGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGATGGAGATATGGATCTCAGCA 37 9 

I I I I I I I I I I I I I I I III I I I I I I I I I I i I I I I I I I I I I 

Db 296 CT GGAGGAACTGGCT CCAATATCCT GGCCTATT GGATAGATGGCGACAT GGAC CTCAGT G 355 

Qy 38 0 TCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATC 439 

I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I 

Db 356 TTAGCATGACCACTTGCTCCACACTGCTTGCCCTTGGAATGATGCCTCTTTGCCTCTTCG 415 

Qy 440 T CT ACACCT GGT CCTGGAGT CTT CAGCAGAAT CT CACCATT CCTT AT CAGAACATAGGAA 4 99 

I I I I I I I I I II I I I I I I I I I I I I I I I I 

Db 416 TCTACACCAAGATGTGGGTTGACTCGGGAACGATTGTGATTCCCTATGATAGCATTGGTA 475 

Qy 500 TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 559 

I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 

Db 47 6 TTTCTCTGGTTGCTCTTGTTATTCCTGTTTCCTTTGG7\ATGTTTGTA7\ATCACAAATGGC 535 

Qy 560 CAAAACAATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 619 

I I I I I I I I I I I I I I I I I I I I I I Mi I I I I I I II I I I I 

Db 536 C AC AAAAAG C G AAGAT TAT ACT T AAAAT T G GAT C CAT C AC AGGT GT AAT T CT CAT T GT G C 595 

Qy 620 TGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTC 67 9 

I I I I I I II I I I I I I I I I I I I I I II I I I I 

Db 596 T CATAGCT GT GAT T GGAG GAATACT GT AC CAAAGT GCCT GGAT CAT T GAACC CAAACT GT 655 

Qy 680 TGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 739 

II II I I I I I I I I I Mill I I I I I I I I I I I I I 

Db 656 GGATTATAGGAACAATATTCCCTATAGCTGGCTACAGCCTGGGTTTCTTCCTGGCTAGAC 715 

Qy 74 0 TTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTAGAAACTGGAGCTCAGAATA 799 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 716 T AGCT GGT CAAC C CT GGT ACAG GTGC C GAACAGT AG CC T T GGAAACT GGAAT GCAGAAC A 775 

Qy 800 T T CAGAT GT GC AT CAC CAT GCT C CAGTT AT CT T T C ACT GCT GAGCACTT GGT C CAGAT GT 859 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 

Db 77 6 CTCAGCTGTGCTCCACCATTGTACAGCTCTCCTTCTCCCCCGAGGATCTCAACCTGGTGT 835 

Qy 8 60 TGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGATGGATTTCTTATTGTTGCAG 919 

II I I I I I I I I I I I I I I I I I I I I I I III III III 

Db 836 T C AC CT T C C CAC T CAT C TAT ACT GT T T T C C AG CTCGTCTTTG C AGCAGT AAT AT T AGGAA 895 

Qy 920 CAT AT CAGACGT ACAAGAGGAGATT GAAGAACAAACAT G 958 

III II [III II II I I I I I I I 

Db 89 6 TTT AT GT CAC AT ACAGGAAAT GT TAT GGAAAAAAT GAT G 934 



RESULT 14 

AB002693 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



AB002693 1629 bp mRNA 

Mus musculus mRNA for ISBT, complete cds . 
AB002693 

AB002693.1 GI:1944178 
ISBT. 

Mus musculus (house mouse) 
Mus musculus 



linear ROD ll-AUG-1999 



Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus . 
REFERENCE 1 (bases 1 to 1629) 

AUTHORS Saeki,T., Matoba,K., Furukawa,H., Kirifuji,K., Kanamoto,R. and 
Iwami, K . 

TITLE Characterization, cDNA cloning, and functional expression of mouse 

ileal sodium-dependent bile acid transporter 
JOURNAL J. Biochem. 125 (4), 846-851 (1999) 
MEDLINE 99203592 
PUBMED 10101301 
REFERENCE 2 (bases 1 to 1629) 
AUTHORS Saeki,T. 
TITLE Direct Submission 

JOURNAL Submitted ( 07-APR-1997 ) Tohru Saeki, Kyoto Prefectural University, 
Department of Biological Resource Chemistry; Nakaragi, Shimogamo, 
Sakyo-ku, Kyoto, Kyoto 606, Japan (E-mail : tsaeki@dns . kpu.ac. jp, 
Tel: 81-75-703-5663, Fax: 81-7 5-703-5661) 
FEATURES Location/Qualifiers 
source 1. .1629 

/organism="Mus musculus" 
/mol_type="mRNA" 
/strain="ICR" 
/db_xref="taxon: 10090" 
/sex-"male" 
/dev_stage="8 weeks" 
CDS 50. .1096 

/ codon_start=l 
/product="ISBT" 
/protein_id="BAA19606. 1" 
/db_xref="GI : 1944179" 

/ translation="MDNSSVCPPNATVCEGDSCWPESNFNAI LNTVMSTVLTILLAM 
A/MFSMGCNVEVHKFLGHIKRPWGIFVGFLCQFGIMPLTGFILSVASGILPVQAVWLI 
MGCCPGGTGSNILAYWIDGDMDLSVSMTTCSTLLALGMMPLCLFVYTKMWVDSGTIVI 
PYDSIGISLVALVIPVSFGMFVNHKWPQKAKIILKIGSITGVILIVLIAVIGGILYQS 
AWIIEPKLWIIGTIFPIAGYSLGFFLARLAGQPWYRCRTVALETGMQNTQLCSTIVQL 
SFSPEDLNLVFTFPLIYTVFQLVFAAVILGIYVTYRKCYGKNDAEFLEKTDNEMDSRP 
SFDETNKGFQPDEK" 

ORIGIN 

Query Match 27.1%; Score 307.8; DB 10; Length 1629; 

Best Local Similarity 59.4%; Pred. No. 1.5e-82; 

Matches 522; Conservative 0; Mismatches 357; Indels 0; Gaps 0; 



Qy 8 0 AT GGAAAC CT G GAGCT C GT T TT C ACAGT GGT GT C CACT GT GAT GAT GGGGCTGCT CAT GT 139. 

I I I I I I I I I I I I I I I I I I I I 

Db 12 9 AT G CAAT T CT CAAT ACAGT GAT GAGCACT GT G CT CACC AT C CT C T TAGC CAT GGT GAT GT 188 

Qy 14 0 TCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGG 199 

I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I 

Db 189 TTT CTAT GGGGT GCAAT GTGGAAGT CCACAAGTT CCTAGGACAT ATAAAGAGAC CATGGG 248 

Qy 2 00 GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG 2 59 

III Mill I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I 

Db 249 GTATCTTCGTGGGCTTCCTCTGTCAGTTTGGAATCATGCCTCTCACAGGCTTTATCCTGT 3 08 

Qy 2 60 CCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 319 

I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 



309 CTGTGGCCTCTGGCATCCTTCCTGTACAGGCTGTAGTGGTGCTAATTATGGGTTGCTGCC 368 



32 0 C G GGG GG C AC CAT CT CT AACAT T TT CACCT T C T GG GT T GAT GGAGAT AT GGAT C T C AGC A 37 9 



I I I | | | | | | I I I I I I III I I I I I I I I I I I I I I I I I I I I I 

Db 369 CT GGAGGAACT GGCT C CAAT AT CCT GGCCTATT GGAT AGAT GGCGAC AT GGACCT CAGT G 428 

Qy 38 0 TCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATC 43 9 

I I I I I I I I I I I I I I I M M I I I I I I I I I I I I I I I I I I I I I I I 

Db 429 TTAGCATGACCACTTGCTCCACACTGCTTGCCCTTGGAATGATGCCTCTTTGCCTCTTCG 488 

Qy 440 T C T AC AC CTGGTCCTG GAGT C T T C AG C AG AAT C T C AC CAT T C CT TAT C AG AACAT AG G AA 499 

I I I I I I I I I I I I I I I I I I I I I I I I 

Db 48 9 T CTACACCAAGAT GTGGGTT GACTC GGGAACGATT GTGATTCCCTATGATAGCATT GGT A 54 8 

Qy 500 TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 559 

I I I I I I I II I I I i I I I I I I I II I I I I I I I I I I I I I I I I I I 

Db 54 9 TTTCTCTGGTTGCTCTTGTTATTCCTGTTTCCTTTGGAATGTTTGTAAATCACAAATGGC 608 

Qy 560 CAAAACAATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 619 

I | | | I I I I I I I I I I I I I I I I I I ' I I I I I I I I I I I I I I I 

Db 609 C ACAAAAAGC GAAG AT TAT ACT T AAAAT T GGAT C C ATC ACAGGT GTAAT T CT CAT T GT G C 668 

Qy 620 TGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTC 679 

I I I I I I I I I I I I I I I I I I I I II II I I I I 

Db 669 T CATAGCT GT GATT GGAG GAAT ACT GT AC CAAAGT GCCT GG AT CAT T GAAC C CAAACT GT 72 8 

Qy 680 TGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 739 

II II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 72 9 GGATTATAGG7VACAATATTCCCTATAGCTGGCTACAGCCTGGGTTTCTTCCTGGCTAGAC 78 8 

Qy 740 T T AC C C AC CAGT CT T G G C AAAG GT G C AG GAC AAT T T C C T T AGAAACT G G AGC T C AGAAT A 799 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 789 TAGCTGGTCAACCCTGGTACAGGTGCCGAACAGTAGCCTTGGAAACTGGAATGCAGAACA 84 8 



Qy 8 00 TT CAGAT GTGCATCACCAT GCT CCAGTTATCTTT CACT GCT GAGCACT T GGT CCAGAT GT 859 

I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I II i I M 

Db 849 CTCAGCTGTGCTCCACCATTGTACAGCTCTCCTTCTCCCCCGAGGATCTC7WVCCTGGTGT 908 

Qy 8 60 T GAGT T T C C CACT GGC CT AT G GACT CT T C CAGCT GAT AGAT GGAT T T CT T AT T GT T GC AG 919 

II I I I I I II I I I I I I I I I I I I I I I M l IN Ml 

Db 909 TCACCTTCCCACTCATCTATACTGTTTTCCAGCTCGTCTTTGCAGCAGTAATATTAGGAA 968 



Qy 920 CAT AT C AGAC GT AC AAGAG GAGAT T GAAGAAC AAAC AT G 958 

III I I I I I I I I II I I I I I I I 

Db 9 69 T T T AT GT C AC AT AC AG G AAAT GT T AT GG AAAAAAT GAT G 1007 



RESULT 15 
OCSDBATRP 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



OCSDBATRP 1116 bp mRNA linear MAM 12-OCT-1995 

O.cuniculus mRNA for ileal sodium-dependent bile acid transporter. 
Z54357 

Z54357.1 GI:1019395 

ileal sodium-dependent bile acid transporter. 
Oryctolagus cuniculus (rabbit) 
Oryctolagus cuniculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 



REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

FEATURES 

source 



5'UTR 
CDS 



3'UTR 
ORIGIN 



Mammalia; Eutheria; Lagomorpha; Leporidae; Oryctolagus . 

1 (bases 1 to 1116) 

Stengelin, S., Apel,S., Becker, W., Maier,M. , Rosenberger , J. , Wess,G. 
and Kramer, W. 

Cloning of the rabbit ileal sodium-dependent bile acid transporter 
Unpublished 

2 (bases 1 to 1116) 
Stengelin, S . 
Direct Submission 

Submitted ( ll-OCT-1995) Stengelin S., Hoechst Marion Roussel, TD 
Metabolism, Building H825, D-65926 Frankfurt am Main, Germany 

Location/Qualifiers 

1. .1116 

/organism="Oryctolagus cuniculus" 

/mol_type="mRNA" 

/strain="New Zealand White" 

/db_xref="taxon: 9986" 

/tissue_type="ileum" 

/dev_stage="adult" 

1. .41 

42. .1085 

/codon_start=l 

/product="ileal sodium-dependent bile acid transporter" 

/protein_id="CAA91184 . 1" 

/db_xref="GI: 1019396" 

/db_xref="GOA:Q28727" 

/db_xref="SWISS-PROT:Q28727" 

/trans la tion="MSNLTVGCLANATVCEGASCVAPESNFNAILSVVLSTVLTILLA 
LVMFSMGCNVEIKKFLGHIRRPWGIFIGFLCQFGIMPLTGFVLAVAFGIMPIQAVWL 
IMGCCPGGTASNIIAYWDGDMDLSVSMTTCSTL^LGMMPLCLYWTKMWVDSGTIV 
I P YDNI GT SLVALWPVS I GMFVNHKWPQKAKI I LKVGS I AGAVLI VLI AWGGI LYQ 
SAWIIEPKLWIIGTIFPMAGYSLGFFLARIAGQPWYRCRTVALETGMQNTQLCSTIVQ 
LSFSPEDLTYVFTFPLIYSIFQIAFAAI FLGIYVAYRKCHGKNDAEFPDIKDTKTEPE 
SSFHQMNGGFQPE" 
1086. .1116 



Query Match 27.0%; 
Best Local Similarity 59.2%; 
Matches 523; Conservative 



Score 306.4; DB 4; 
Pred. No. 3.7e-82; 
0; Mismatches 361; 



Length 1116; 



Indels 



0 ; Gaps 



0; 



QY 
Db 

Qy 

Db 



80 ATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGT 139 

Ml | | M I I I I I I III M I IMM II I I I I I M 

124 ATGCCATCCTCAGCGTGGTTCTGAGTACCGTGCTGACCATCCTGCTGGCTCTGGTCATGT 183 

140 TCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGG 199 

M II I I I I I I I MINIMI Ml I I II M II M I I I I II II I 

18 4 TCTCCATGGGATGCAACGTGGAAATCAAGAAATTCCTGGGGCACATAAGGCGGCCCTGGG 243 



Qy 

Db 

Qy 

Db 



200 GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG 259 

MM Ml I I M I I I M I II I I I I I I M I II I II I I I M II I 

24 4 GCATCTTCATTGGCTTCCTCTGCCAGTTTGGGATCATGCCCCTCACGGGATTTGTCCTAG 303 



260 



304 



CCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 

| | | | | I I I I I I I I M I I I I II I M I I I M I I I I M I II I 

CGGTGGCCTTTGGGATCATGCCCATCCAGGCCGTGGTGGTGCTCATCATGGGATGCTGCC 



319 



363 



Qy 320 C G G GGGGCAC CAT C T CTAACATT T T CACC T T CT GGGT T GAT GGAGAT AT G GAT CT CAGC A 379 

I I I I I I I I I I I I I II I III I I I I I I I I I II II I I I I I III 
Db 364 CT GGAGGAAC GGC CT C CAACAT C CT GG CCT ATT GGGT GGAT G GAGAC AT G GACT T GAGT G 423 

Qy 38 0 TCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATC 439 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 424 TCAGCATGACCACCTGCTCCACATTGCTTGCCCTCGGAATGATGCCGTTATGCCTTTATG 4 83 

Qy 44 0 T C T AC AC CT GGT C CT G GAGT CT T C AG C AGAAT C T C AC CAT T C C T TAT C AGAAC AT AG GAA 4 99 

I I I I I I I I III II I I I I I I I I I I I I I I I I I I I 

Db 484 T C T ACAC CAAAAT GT GGGT GGACT CT GGGACC AT T GTAAT T CCT T AC GATAACAT AGGT A 543 

Qy 500 TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 559 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 544 CTTCCTTGGTTGCTCTTGTTGTTCCCGTTTCCATCGGAATGTTTGTTAATCACAAGTGGC 603 

Qy 560 CAAAACAATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 619 

I Ml I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 
Db 604 C C CAAAAGG C GAAGAT TAT ACT T AAAGTT GGAT CCAT T GC AGGT G C AGT CCTT ATT GT GC 663 

Qy 620 TGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTC 67 9 

i i ii ii i iii i mi II II II 

Db 664 T CAT AGCT GT G GT AGGAGGAAT ATT GT AC CAAAGT G C CT GGAT CAT C GAAC C CAAG CT GT 723 

Qy 68 0 TGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 739 

I i II I I I I I I I I II II I I I I I I I I I I I I 

Db 724 GGATTATAGGGACGATATTTCCCATGGCCGGTTACTCCCTTGGCTTTTTTTTGGCTAGGA 783 

Qy 74 0 TTACCCACCAGT CTT GGCAAAGGTGCAGGACAATTT CCT TAGAAACT GGAGCT CAGAATA 799 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I 

Db 784 T AGCT G GT CAGC CAT G GT AC AGGT GC CGAACAGTT G CT T T GGAAAC AGG GAT GCAGAACA 843 

Qy 800 T T CAGAT GT G CAT C AC CAT GCT C C AGT TAT CT T T C ACT GCT GAGC ACT T G GT C C AGAT GT 859 

I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I III 

Db 844 C AC AGCT GT GCT C CAC GAT C GT GC AGCT CT C CT T CAGC C CCGAGGACCT CACT TAT GT GT 903 

Qy 860 TGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGATGGATTTCTTATTGTTGCAG 919 

I I M I I I I I I I I I I I I I I I I I III I I I I I 

Db 904 TCACCTTCCCGCTCATTTACAGCATCTTCCAGATCGCCTTTGCAGCAATCTTCTTAGGAA 963 

Qy 92 0 CAT AT C AG AC G T AC AAG AG GAG AT T G AAG AAC AAAC AT G G AAAA 963 

II II I I I I I I I I II I I I I I I I I 

Db 964 TAT AT GT C G CAT AT AGGAAAT GT CAT GGGAAAAAT GAT GCAGAA 1007 



Search completed: March 25, 2004, 17:56:19 
Job time : 4 669 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 
Run on: 



Title: 

Perfect score: 
Sequence : 



March 25, 2004, 08:03:39 ; Search time 575 Seconds 

(without alignments) 
8378.185 Million cell updates/sec 

US-10-091-628-1 
1134 

1 atgagagccaattgttccag acatcacttcatgtgaatag 1134 



Scoring table: IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 

Searched: 3373863 seqs, 2124099041 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 



6747726 



Post-processing: 



Database 



Minimum Match 0% 
Maximum Match 100% 
Listing first 45 summaries 

N_Geneseq_29 Jan04 : * 



1 

2 
3 
4 
5 
6 
7 
8 
9 

10 



geneseqnl980s : * 
geneseqnl990s : * 
geneseqn2000s : * 
geneseqn2001as : * 
geneseqn2001bs : * 
geneseqn2002s : * 
geneseqn2003as : * 
geneseqn2003bs : * 
geneseqn2003cs : * 
geneseqn2004s : * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 



Result Query 

No. Score Match Length DB 



ID 



Description 



1 


1134 


100. 


0 


1134 


6 


AAD46333 


Aad46333 


2 


1134 


100. 


0 


1600 


6 


AAD46334 


Aad46334 


3 


1130.8 


99. 


7 


1517 


7 


AAD47353 


Aad47353 


4 


655. 8 


57. 


8 


987 


6 


ABS59328 


Abs59328 


5 


320.4 


28. 


3 


2263 


2 


AAQ91108 


Aaq91108 


6 


297.8 


26. 


3 


1047 


2 


AAQ91109 


Aaq91109 


7 


297.8 


26. 


3 


3779 


7 


ACF63388 


Acf 63388 



Human sod 
Human sod 
Human tra 
Human ile 
Hamster i 
Human ile 
Human IBA 





Q 


297 . 8 


26 . 3 
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ALIGNMENTS 



RESULT 1 
AAD46333 

ID AAD46333 standard; DNA; 1134 BP. 
XX 

AC AAD4 6333; 
XX 

DT 27-JAN-2003 (first entry) 
XX 

DE Human sodium/bile-like transporter DNA #1. 
XX 

KW Human; sodium/bile-like transporter; novel human protein; drug screening; 

KW NHP; cancer; cosmetic; nutriceutical ; gene therapy; cytostatic; gene; 

KW chromosome 4; ds . 
XX 



OS Homo sapiens. 
XX 

FH Key Location/Qualif iers 

FT CDS 1. .1134 

FT /*tag= a 

FT /product^ "Human sodium/bile-like transporter" 
XX 

PN WO200272774-A2. 
XX 

PD 19-SEP-2002. 
XX 

PF 06-MAR-2002; 2002WO-US007438 . 
XX 

PR 12-MAR-2001; 2001US-0275009P . 

PR 17-APR-2001; 2001US-0284152P . 
XX 

PA (LEXI-) LEXICON GENETICS INC. 
XX 

PI Wilganowski NL, Nepomnichy B, Burnett MB, Hu Y; 
XX 

DR WPI; 2002-723334/78. 

DR P-PSDB; AAE28936. 
XX 

PT New protein and nucleic acid molecule, useful for diagnosing or treating 

PT diseases, e.g. cancer, for drug screening, clinical trial monitoring, 

PT pharmacogenomics, and for cosmetic or nutriceutical applications. 
XX 

PS Claim 1; Page 37; 41pp; English. 
XX 

CC The invention relates to novel human proteins (NHP) , sodium/bile-like 

CC transporter and their nucleic acids. The invention is useful for 

CC identifying the protein which may be used for diagnosis, clinical trial 

CC monitoring, drug screening, pharmacogenomics, treatment of diseases such 

CC as cancer, and for cosmetic or nutriceutical applications. The nucleic 

CC acid molecule may also be used as hybridisation probes for screening 

CC libraries, assessing gene expression patterns, and in amplification 

CC assays. It is also used in gene therapy. The present sequence is human 

CC sodium/bile-like transporter DNA. This gene is located at chromosome 4 
XX 

SQ Sequence 1134 BP; 251 A; 281 C; 287 G; 315 T; 0 U; 0 Other; 

Query Match 100.0%; Score 1134; DB 6; Length 1134; 
Best Local Similarity 100.0%; Pred. No. 0; 

Matches 1134; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 ATGAGAGC CAATTGTTCCAGCAGCT CAGCCT GCCCT GCCAACAGTT CAGAGGAGGAGCT G 60 

I I I I I I I I I I I I I I I I I I I I I I i I II I I I I I I I I I I I I I I II I I I II I I I I I I I I I M I I 

Db 1 AT GAGAGCCAATTGTTCCAGCAGCT CAGCCTGCCCT GCCAACAGTT CAGAGGAGGAGCT G 60 

Qy 61 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 12 0 

I | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i M I I I I 

Db 61 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 120 

Qy 121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 18 0 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 180 



181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 240 

I I I I I I I | | I I I I i I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I M I II 

181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 24 0 

241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 300 

| I | | I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 300 

3 01 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

301 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 360 

361 GGAGAT AT G GAT CT C AG CAT C AGT AT GACAAC C T GT T C C AC CGTGGCCGCC CT GGGAAT G 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
361 G GAG AT AT G GAT C T C AG CAT C AGT AT GACAAC C T GT T C C AC CGTGGCCGCCCTGG G AAT G 420 

421 AT GC C ACT CT G C AT TT AT CT CT AC AC CT GGT C CT GGAGT CT T C AGC AGAAT CT C AC CAT T 4 80 

| | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
421 ATGCCACTCTGCATTTATCTCTACACCTGGTCCTGGAGTCTTCAGCAGAATCTCACCATT 480 

481 CCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTC 54 0 

| | I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
481 CCT T AT CAGAAC AT AGGAAT T AC C CT T GT GT GC CT GAC CAT T C CT GT GGC CT T T GGT GT C 54 0 

541 TAT GT G AAT T AC AG AT GGC C AAAAC AAT C C AAAAT CAT T C T C AAG AT TGGGGCCGTTGTT 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
541 TAT GT GAAT T AC AGAT GGC C AAAAC AAT C C AAAAT CAT T CT C AAGAT TGGGGCCGTTGTT 600 

601 GGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGG 660 

| | I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
601 GGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGG 660 

661 AATTCAGACATCACCCTTCTGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACG 720 

| | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
661 AATTCAGACATCACCCTTCTGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACG 720 

721 GGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTA 78 0 

| | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
721 GGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTA 7 80 

781 GAAACT GGAGCT CAGAAT ATT C AGAT GTGCAT CACCAT GCT CCAGT TAT CTTT CACT GCT 840 

| | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
781 GAAACTGGAGCT CAGAAT ATT C AG AT GT GCAT CACCAT GCTC CAGTTAT CTTT CACT GCT 84 0 

841 GAGCACT T GGT C CAG AT GT T GAGT T T C C CACT GGCCT AT GGACT CT T C C AGC T GAT AGAT 900 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
841 GAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGAT 900 

901 GGAT TT CT T AT T GTT GC AG CAT AT C AGAC GT ACAAGAG GAGAT T GAAGAACAAAC AT GGA 960 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
901 GGAT T T CTT AT T GT T G CAGCATAT C AGAC GT ACAAGAGGAGAT T GAAGAACAAAC AT G GA 960 

961 AAAAAGAACT C AGGT T GC ACAGAAGT CT GC C AT AC GAGGAAAT C GACT T CT T C C AGAGAG 1020 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 

961 AAAAAGAACT CAGGT T G C ACAGAAGTCT G C CAT AC GAG GAAAT C GACT T CTT C CAGAGAG 1020 

1021 ACCAATGCCTTCTTGGAGGTGAATGAAGT^GGTGCCATCACTCCTGGGCCACCAGGGCCA 1080 



1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 1021 ACCAATGCCTTCTTGGAGGTGAATGAAGAAGGTGCCATCACTCCTGGGCCACCAGGGCCA 1080 

Qy 1081 AT G GAT T GC CAC AGGGCT CT C GAGC C AGT T GG C C AC AT C ACT T CAT GT GAAT AG 1134 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1081 AT GGAT T GCCACAG GGCT CT C GAG C CAGT T GG CC ACAT CACTT CAT GT GAAT AG 1134 



RESULT 2 
AAD46334 

ID AAD46334 standard; DNA; 1600 BP. 
XX 

AC AAD46334; 
XX 

DT 27-JAN-2003 (first entry) 
XX 

DE Human sodium/bile-like transporter DNA #2. 
XX 

KW Human; sodium/bile-like transporter; novel human protein; drug screening; 

KW NHP; cancer; cosmetic; nutriceutical ; gene therapy; cytostatic; gene; 

KW chromosome 4; ds . 
XX 

OS Homo sapiens. 
XX 

PN WO200272774-A2. 
XX 

PD 19-SEP-2002. 
XX 

PF 06-MAR-2002; 2002WO-US007438 . 
XX 

PR 12-MAR-2001; 2001US-0275009P . 

PR 17-APR-2001; 2001US-0284152P. 
XX 

PA (LEXI-) LEXICON GENETICS INC. 
XX 

PI Wilganowski NL, Nepomnichy B, Burnett MB, Hu Y; 
XX 

DR WPI; 2002-723334/78. 
XX 

PT New protein and nucleic acid molecule, useful for diagnosing or treating 

PT diseases, e.g. cancer, for drug screening, clinical trial monitoring, 

PT pharmacogenomics , and for cosmetic or nutriceutical applications. 
XX 

PS Disclosure; Page 38-39; 41pp; English. 
XX 

CC The invention relates to novel human proteins (NHP), sodium/bile-like 

CC transporter and their nucleic acids. The invention is useful for 

CC identifying the protein which may be used for diagnosis, clinical trial 

CC monitoring, drug screening, pharmacogenomics, treatment of diseases such 

CC as cancer, and for cosmetic or nutriceutical applications. The nucleic 

CC acid molecule may also be used as hybridisation probes for screening 

CC libraries, assessing gene expression patterns, and in amplification 

CC assays. It is also used in gene therapy. The present sequence is human 

CC sodium/bile-like transporter DNA. This gene is located at chromosome 4 
XX 

SQ Sequence 1600 BP; 367 A; 366 C; 399 G; 468 T; 0 U; 0 Other; 



Query Match 100.0%; Score 1134; DB 6; Length 1600; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 1134; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 



Qy 1 AT GAGAGC C AAT T GT T C C AGC AG CT C AGC CT G C CC T GC C AAC AGT T C AGAG G AGGAGCT G 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 194 AT GAGAGC C AAT T GT T C CAGC AGCT CAGC CT G C C CT GC CAACAGT T CAGAGGAGGAGCT G 253 

Qy 61 CCAGT GGGACT GGAGGTGCAT GGAAACCTGGAGCT CGTTTTCACAGTGGTGT CCACT GT G 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 254 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 313 

Qy 121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 314 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 373 

Qy 181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 24 0 

I I I I II 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I 
Db 37 4 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 4 33 

Qy 241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 300 

I 1 1 1 l.l I 1 1 I 1 1 I I I I III I I 1 1 1 1 1 I 1 1 I 1 1 I 1 1 1 I I I 1 1 1 1 I 1 1 1 1 i 1 1 I II I 1 1 I I I 

Db 434 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 4 93 

Qy 301 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4 94 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 553 

Qy 361 GGAGATATGGATCTCAGCATCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATG 420 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
Db 554 GGAGATATGGATCTCAGCATCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATG 613. 

Qy 421 ATGCCACTCTGCATTTATCTCTACACCTGGTCCTGGAGTCTTCAGCAGAATCTCACCATT 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 614 AT GC CAC T CT GCAT T TAT CT CT ACACCT GGT CCT GGAGT CT T CAGCAGAAT CT CACCAT T 673 

Qy 4 81 CCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTC 540 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
Db 67 4 CCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTC 733 

Qy 541 TAT GT GAAT T AC AGAT GGC C AAAAC AAT C C AAAAT CAT T CT C AAGAT TGGGGCCGTTGTT 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 734 TAT GTGAATTACAGATGGCCAAAACAAT CCAAAAT CATTCT CAAGATT GGGGCCGTT GTT 793 

Qy 601 GGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGG 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 794 GGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGG 853 

Qy 661 AAT T C AG AC AT CAC C C T T C T G AC CAT C AGT T T CAT CTTTCCTTT GAT T G G C CAT GT CAC G 720 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 854 AATTCAGACATCACCCTTCTGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACG 913 

Qy 721 GGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTA 780 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 914 GGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTA 973 

Qy 781 GAAACT GGAGCTC AGAAT ATT CAGAT GTGCAT CACCAT GCT C CAGTT AT CTT TCACT GCT 84 0 



Db 



I | | | | | | | | | | | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

974 GAAACT GGAGCT C AGAAT AT T CAGAT GT GC AT CAC C AT GCT C CAGTT AT CT TT C ACT GCT 1033 



Qy 


841 


Db 


1034 


Qy 


901 


Db 


1094 


Qy 


961 


Db 


1154 


Qy 


1021 


Db 


1214 


Qy 


1081 


Db 


1274 



GAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGAT 900 

I I I I I I | | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

GAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGAT 1093 

GGATTT CTT ATT GTT GC AGCAT AT CAGAC GTACAAGAGGAGATT GAAGAACAAAC AT GGA 960 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I 

GGATTT CTTATT GTTGCAGCAT AT CAGAC GTACAAGAGGAGAT T GAAGAACAAACAT GGA 1153 

AAAAAGAACT CAGGTT GCACAGAAGT CTGCCATACGAGGAAAT CGACTTCTTCCAGAGAG 1020 
| | | | | | | | | | | | | | | | | | | I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
AAAAAGAACT CAGGTT GCACAGAAGT CTGCCATACGAGGAAAT C GACTTCTTCCAGAGAG 1213 

ACCAATGCCTTCTTGGAGGTGAATGAAGAAGGTGCCATCACTCCTGGGCCACCAGGGCCA 1080 

I I I I I I I I I M I I I I I I I I I II I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I 

ACCAATGCCTTCTTGGAGGTGAATGAAGAAGGTGCCATCACTCCTGGGCCACCAGGGCCA 1273 

AT G GAT T G C CAC AG G G C T C T C GAG C C AG T T G G C CAC AT CAC T T CAT GT G AAT AG 1134 

I | I I I I I I I I I I I I I I I I I I I I I i I I 1 I I I I I I I I I I I I I I II I I I I I I I M I I 

AT GGAT T GC CACAGG GCT CT CGAGC CAGT T GGC CAC AT CACT T CAT GT GAAT AG 1327 



RESULT 3 
AAD47353 



ID 
XX 
AC 
XX 
DT 
XX 
DE 
XX 
KW 
KW 
KW 
KW 
KW 
KW 
KW 
KW 
XX 
OS 
XX 
FH 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
XX 
PN 
XX 



AAD47353 standard; cDNA; 1517 BP. 



AAD47353; 



24-FEB-2003 (first entry) 

Human transporter and ion channel (TRICH) cDNA #6. 

Human; transporter and ion channel; TRICH; neurodegenerative disorder; 
Parkinson's disease; Alzheimer's disease; muscular disorder; transgenic- 
myotonic dystrophy; catatonia; endocrine disorder; diabetes; 
Grave 1 s disease; cancer; leukaemia; cervical; immunological; 
systemic lupus erythematosus; allergy; gastrointestinal; Crohn's disease; 
Goodpasture's syndrome; infection; cardiovascular; fungicide; nootropic; 
hepatic disease; cirrhosis; gene therapy; uropathic; anti-HIV; virucide; 
atherosclerosis; antiparasitic; protozoacide; antibacterial; gene; ss. 



cytostatic; 
scleroderma; 



Homo sapiens, 

Key 
CDS 



sig_peptide 
mat_peptide 

WO200277237-A2. 



Location/ Qualifiers 
249. .1382 
/*tag= a 

/product= "Human TRICH protein" 
361. .539 
/*tag= b 
540. .1379 
/*tag= c 

/product= "Human mature TRICH protein" 



PD 


n 9 - 


1 


. o n n 9 

7.UUZ . 






XX 












PF 


U o 




z u uz , 


ZUUZ KVU 




XX 












PR 


uy- 


- r JtiD- 


zuui; 


onm ttq — 
ZUUlUo 


fl9 

uzo /oyzt . 


PR 


zj- 


~ r bo- 


9 n n i . 
Z U U 1 ; 


onm ttq — 
ZUUIUo 


UZ / XXDOr . 


PR 


02- 


-MAR- 


■2001; 


2001US- 


0272890P. 


PR 


16 


-MAR- 


•2001; 


2001US- 


0276860P. 


PR 


23- 


-MAR- 


•2001; 


2001US- 


0278255P. 


PR 


30 


-MAR- 


-2001; 


2001US- 


-0280538P. 


PR 


25- 


- JAN- 


•2002; 


2002US- 


■0351359P. 



XX 

PA (INCY-) INCYTE GENOMICS INC. 
XX 

PI Lee EA, Ding L, Baughn MR, Tribouley CM, Bruns CM, Elliott VS; 

PI Walia NK, Forsythe IJ, Raumann BE, Burford N, Lai PG, Thornton M; 

PI Gandhi AR, Arvizu C, Yao MG, Yue H, Xu Y, Hafalia AJA, Ison CH; 

PI Chen H; 
XX 

DR WPI; 2003-018931/01. 

DR P-PSDB; AAE29906. 
XX 

PT New TRICH polypeptides, useful for diagnosing, preventing, and treating 

PT disorders associated with an abnormal expression or activity of TRICH, 

PT e.g. neuromuscular, immunological, cardiovascular disorders, cancer and 

PT infection. 
XX 

PS Claim 5; Page 197-198; 214pp; English. 
XX 

CC The invention relates to human transporters and ion channels (TRICH) and 

CC their nucleic acids. The sequences of the invention are useful in 

CC diagnosing, preventing, and treating disorders associated with an 

CC abnormal expression or activity of TRICH, such as neurodegenerative 

CC disorders (e.g. Parkinson's disease, Alzheimer f s disease), muscular 

CC disorders (e.g. myotonic dystrophy, catatonia), endocrine disorders (e.g. 

CC diabetes, Grave's disease), cancers (e.g. leukaemia, cervical or breast 

CC cancers), immunological disorders (e.g. ^scleroderma, systemic lupus 

CC erythematosus, allergies), gastrointestinal disorders (e.g. Crohn's 

CC disease), renal disorders (e.g. Goodpasture's syndrome), infections (e.g. 

CC viral, bacterial, fungal, parasitic, protozoal, helminthic) , 

CC cardiovascular disorders (e.g. atherosclerosis), or hepatic diseases 

CC (e.g. cirrhosis). TRICH or its fragments may also be used in screening 

CC for compounds that specifically bind to and modulate its activity. TRICH 

CC DNA can be used to create humanised animals or transgenic animals to 

CC model human disease. It is also used in gene therapy. The present 

CC sequence is human TRICH cDNA 

XX 

SQ Sequence 1517 BP; 356 A; 352 C; 374 G; 435 T; 0 U; 0 Other; 

Query Match 99.7%; Score 1130.8; DB 7; Length 1517; 

Best Local Similarity 99.8%; Pred. No. 0; 

Matches 1132; Conservative 0; Mismatches 2; Indels 0; Gaps 0; 

Qy 1 AT GAG AG C C AAT T GT T C C AG C AGCT C AG CCT GC C CT GC C AAC AGT T C AGAGGAGGAGCT G 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 249 AT GAGAGC CAAT T GT T CC AGC AGCT CAGC CT G C C CT GC CAAC AGTT CAGAG GAGGAGCT G 308 



61 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 120 

| | | | | | | | | | || | | | I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 

309 CCAGTGGGACTGGAGGCGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGCCCACTGTG 368 
121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 180 

I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

369 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 428 

181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 240 

| | | | | | | | I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I N I N I I I I I I I I I I I I I I I 

429 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 4 8 8 

241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 300 

| I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 

489 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 548 

301 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 360 

| | | | | | | | | | | | | | | I I I I I I I I I I I I I I I I I I I I I I I I I M I II I I I I I I I I I I I I I I I 

54 9 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 608 

361 GGAGAT AT GGAT CT CAGCAT CAGT AT GACAACCT GTT C CACCGT GGCCGCCCT GGGAAT G 42 0 

| I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I MINI 

60 9 GGAGATATGGATCTCAGCATCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATG 668 

421 AT GC C ACT CT GCAT T TAT CT CT AC AC CT G GT C CT G GAGT CTT C AGC AGAAT CT CAC CAT T 480 

| | | | | | | | | | M I I I I I M I I I I I I I M I I I I I I I M I I I I I I I I I I I I I I M I I II I I I 
669 AT GC C ACT CT GCATT T AT CT CT ACAC CT G GT C CT G GAGT CTT CAG CAGAAT CT CAC CAT T 728 

481 CCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTC 54 0 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I 

72 9 CCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTC 788 

541 TAT GT GAATT AC AGAT GG C C AAAAC AAT C CAAAAT CAT T CT C AAGAT TGGGGCCGTTGTT 600 

| | | | | | | | | || | | | | I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
789 TAT GTGAATT ACAGAT GGCCAAAACAAT C CAAAAT CATT CT CAAGATT GGGGCCGTT GTT 848 

601 GGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGG 660 

| | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
849 GGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGG 908 

661 AAT T CAG AC AT CAC C C T T C T G AC CAT CAGT T T CAT CTTTCCTTT GAT T G G C CAT GT CAC G 720 

| | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

909 AATTCAGACATCACCCTTCTGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACG 968 

721 GGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTA 780 

| | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

969 GGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTA 1028 

781 GAAACT G GAG CT CAGAAT AT T C AGAT GT G CAT CAC CAT GCT CC AGT TAT CT T T C ACT GCT 84 0 

| | M I I I | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

1029 GAAACT GGAGCT CAGAAT ATT C AGAT GT GCAT CACCAT GCT CCAGTT AT CTTT CACT GCT 1088 

841 GAGCACT T GGTCCAGAT GTT GAGT TT CCCACT GGCCT AT GGACT CTT CCAGCT GAT AGAT 900 

MINI I I I I I I I I I I I I M M I I I I II M M M M I I I I I I I I I I M I 

1089 GAGCACTT GGTCCAGAT GTT GAGTTT CCCACTGGCCTAT GGACT CTT CCAGCT GAT AGAT 114 8 
901 GGATT T CT T ATT GTT GCAGCAT AT C AGAC GTAC AAGAGGAGAT T GAAGAACAAACAT GGA 960 



Db 



I | | | | | | | | | | I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 

114 9 GGATTT CTTATT GTT GCAGCATATCAGAC GTACAAGAGGAGATT GAAGAACAAACAT GGA 1208 



Qy 



Db 



961 AAAAAGAACT CAGGT T GCACAGAAGT CT GC CAT AC GAGGAAAT CGACTT CTT C CAGAGAG 1020 
I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
1209 AAAAAGAACT CAGGT T GCACAGAAGT CT GC CAT AC GAGGAAAT CGACT T C T T C CAGAGAG 1268 



Qy 



Db 



1021 ACCAATGCCTTCTTGGAGGTGAATGAAGAAGGTGCCATCACTCCTGGGCCACCAGGGCCA 1080 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1269 AC CAAT GC C T T CT TG GAGGT GAAT GAAGAAGGT GC CAT C ACT C CT G GG CCAC C AGGGCC A 1328 



Qy 



1081 AT GGAT T G C C ACAGG G CT CT C GAGC CAGT T GGC CACAT C ACT T CAT GT GAAT AG 1134 




Db 



1329 AT GGAT T GC C AC AGGGCT CT C GAGC CAGT T GG C CACAT C ACT T CAT GT GAAT AG 1382 



RESULT 4 
ABS59328 

ID ABS59328 standard; DNA; 987 BP. 
XX 

AC ABS59328; 
XX 

DT 05-NOV-2002 (first entry) 
XX 

DE Human ileal sodium/bile acid cotransporter-like gene. 
XX 

KW Human; NOVX; cardiomyopathy; atherosclerosis; cell signal processing; 

KW breast cancer; Alzheimer's disease; epilepsy; Huntington's disease; 

KW anxiety; behavioural disorder; multiple sclerosis; myasthenia gravis; 

KW neurodegeneration; Parkinson's disease; pain; stroke; endometriosis; 

KW autoimmune disease; allergy; addiction; asthma; transplantation; 

KW graft versus host disease; systemic lupus erythematosus; scleroderma; 

KW psoriasis; Crohn's disease; HIV infection; human immunodeficiency virus; 

KW atherosclerosis; cirrhosis; rheumatoid arthritis; diabetes; pancreatitis; 

KW thrombocytopenia; bleeding disorder; metabolic disorder; obesity; 

KW glucose transport defect; glomerulonephritis; hypercalcaemia; 

KW polycystic kidney disease; renal tubular acidosis; skin disorder; 

KW congenital diarrhoea; respiratory disease; gastro-intestinal disease; 

KW muscle disorder; bone disorder; joint disorder; skeletal disorder; 

KW haematopoietic disorder; urinary system disorder; osteoporosis; ds; 

KW dental disease; dental infection; growth disorder; reproductive disorder; 

KW hypogonadism; fertility disorder; viral infection; bacterial infection; 

KW parasitic infection; metabolic pathway modulation; gene therapy; gene; 

KW zinc metalloprotease; ADAM-TS 7; alpha-2-macroglobulin precursor; 

KW ileal sodium/bile acid cotransporter ; prohibitin; MT; CIP4; spinesin; 

KW macrophage stimulating protein precursor; fatty acid-binding protein; 

KW gap junction beta-5 protein; hepsin/plasma transmembrane serine protease. 

XX 

OS Homo sapiens. 
XX 

PN WO200233087-A2. 
XX 

PD 25-APR-2002. 
XX 

PF 17-OCT-2001; 2001WO-US032496 . 
XX 

PR 17-OCT-2000; 2000US-0241040P . 
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PR 
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o n n n t t o 
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no/iooQi d 
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PR 


29- 
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onnn. 

zuuu; 


ZUUUUb- 


Uzoy uzo r . 


PR 


20 


-FEB- 


-2001; 


2001US- 


■0269813P. 


PR 


25 


-APR- 


-2001; 


2001US- 


•0286324P. 


PR 


29 


-MAY- 


-2001; 


2001US- 


0294108P. 


PR 


09 


-JUL- 


-2001; 


2001US- 


•0303698P. 


PR 


16 
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-2001; 


2001US- 


•00981151. 



XX 

PA (CURA-) CURAGEN CORP. 
XX 

PI Edinger S, Gerlach V, Macdougall JR, Malyankar UM, Smithson G; 

PI Millet I, Peyman JA, Stone DJ, Gunther E, Ellerman K, Shimkets RA; 

PI Padigaru M, Guo X, Patturajan M, Taupier RJ, Burgess CE; 

PI Zerhusen BD, Kekuda R, Spytek KA, Gangolli EA, Fernandes ER; 

PI Gorman L; 

XX 

DR WPI; 2002-590434/63. 

DR P-PSDB; ABG76899. 
XX 

PT Cytoplasmic, nuclear, membrane bound and secreted polypeptides and 

PT nucleic acids encoding the polypeptides for diagnosing and treating e.g. 

PT cancer, Alzheimer f s disease, cardiomyopathy, metabolic disease and 

PT diabetes. 

XX 

PS Claim 8; Page 50; 305pp; English. 
XX 

CC The present invention relates to new NOVX (NOVl-10) polypeptides. The 

CC molecules of the invention are useful for treating or preventing a NOVX- 

CC associated disorder, such as cardiomyopathy, atherosclerosis, or a 

CC disorder related to cell signal processing and metabolic pathway 

CC modulation in humans. NOVX polypeptides, nucleic acids and antibodies are 

CC useful for treating or preventing disorders or syndromes including breast 

CC cancer, Alzheimer f s disease, epilepsy, Huntington's disease, anxiety, 

CC behavioural disorders, multiple sclerosis, myasthenia gravis, 

CC neurodegeneration, Parkinson's disease, pain, stroke, autoimmune disease, 

CC allergies, addiction, asthma, endometriosis, graft versus host disease, 

CC systemic lupus erythematosus, scleroderma, transplantation, psoriasis, 

CC Crohn's disease, HIV (human immunodeficiency virus) infection, 

CC atherosclerosis, cirrhosis, rheumatoid arthritis, diabetes, 

CC thrombocytopenia, bleeding disorders, metabolic disorders, obesity, 

CC glucose transport defect, glomerulonephritis, hypercalcaemia, polycystic 

CC kidney disease, pancreatitis, renal tubular acidosis, skin disorders, 

CC congenital diarrhoea, respiratory disease, gastro-intestinal diseases, 

CC muscle, bone, joint and skeletal disorders, haematopoietic disorders, 

CC urinary system disorders, osteoporosis, dental disease and infection, 

CC growth and reproductive disorders, hypogonadism, fertility, and/or other 

CC pathologies and disorders, viral, bacterial, or parasitic infections. The 

CC present nucleic acid sequence encodes a NOVX protein of the invention 



SQ Sequence 987 BP; 206 A; 243 C; 236 G; 302 T; 0 U; 



0 Other; 



Query Match 57.8%; Score 655.8; DB 6; Length 987; 

Best Local Similarity 86.3%; Pred. No. 1.5e-195; 

Matches 8 03; Conservative 0; Mismatches 77; Indels 51; Gaps 5; 

Qy 1 AT GAG AG C C AAT T GT T C C AG C AG C T C AG CCTGCCCTGC C AAC AGT T C AG AG GAG GAG C T G 60 

| | | | | | | I I | I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1 ATGAGAGCCAATTGTTCCAGCAGCTCAGCCTGCCCTGCCAACAGTTCAGAGGAGGAGCTG 60 

Qy 61 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 12 0 

| | | | | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I 

D b 61 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTATC 120 

Qy 121 AT GAT GGGGCT GCT CAT GT T CT CTT T GGGAT GTT C C GT G GAGAT CCGGAAGCT GT GGT C G 18 0 

| | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

D b 121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 18 0 

Qy 181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 24 0 

I I M I I I Ml I I I I I I I I I I I I I I I I I I I I I M M I M I I M I I I I I I I I I I I I I I I I I I 

Db 181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 24 0 

Qy 241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 300 

I I I I I | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 300 

Qy 301 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 360 

| | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 CTCATCATGGGCTGCTG-CCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 359 

Qy 361 GGAGAT AT GGAT CT CAGC AT C AGT AT GACAAC CT GTT CC ACCGT GGC C GC C CT GGGAAT G 42 0 

I I I I I I II I I I I I I I I I I I I I I I I 

Db 360 GGAGAT AT GGAT CT CA GGT GCCCT GGGAAT G 390 

Qy 421 AT GC C ACT CT GCATT T AT CT CT ACACCT GGT CCT GGAGT CTT CAGCAGAAT CT CACCAT T 48 0 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 391 AT GC CACT CT G CAT T TAT CT CTAC ACCT GGT C CT GGAGT CT T CAGCAGAAT CT CAC CAT T 4 50 

Qy 481 CCTTATCAGAACA T AGGAATT ACCCTT GT GT GCCT GACCATT CCTGT G 528 

| I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 4 51 CCTTATCAGAACATAGGTCTGTCTTTAGGAATTACCCTTGTGTGCCTGACCATTCCTGTG 510 

Qy 529 GCCTTTGGTGT CT AT GT GAAT T AC AGAT GGC C AAAAC AAT C C AAAAT CAT T CT CAAGAT T 58 8 

| | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 511 GCCT T T G GT GT CT AT GT GAAT T AC AGAT GGC C AAAAC AAT C C AAAAT CAT T CT C AA 566 

Qy 589 GGGGCCGTTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCG 648 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 567 — GGCCGTTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCG 624 

Qy 64 9 AAAGGATCTTGGAATTCAGACATCACCCTTCTGACCATCAGTTTCATCTTTCCTTTGATT 708 

| | | I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 625 AAAGGAT CTT GGAAT T C AGAC AT CAC C CT T CT GAC CAT CAGT T T CAT CTTTCCTTT GAT T 684 

Qy 7 09 GGCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGG 768 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 

Db 685 GGCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGACCTTG 744 



Qy 7 69 AC AAT T T C C T T AG AAAC T G GAG C T C AG AAT AT T C AG AT GT G CAT C AC CAT G C T C C AGT T A 828 

I I I I I I I I I I I I I I I I I M I I 

Db 7 45 CCTATCTTTTTAG GT T TAG CT T T CAAGACAC C CT GT GAT ACC CTACT C G CAAT GACT 801 

Qy 829 TCTTTCACTGCTGAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTC 888 

Ml M I I Ml I I I I I I I I I I I I I I I I I I 

Db 8 02 TCGTGTCCTGAATGTTCCAGGCTCATCTATGCCTTCATTCCTCTGCTATATGGACTCTTC 861 

Qy 8 89 C AG CT GAT AGAT GGAT T T CT T AT T GT T GCAG 919 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 862 C AG C T GAT AGAT G GAT T T C T TAT T GT T GAAG 892 



RESULT 5 
AAQ91108 

ID AAQ91108 standard; cDNA; 2263 BP. 
XX 

AC AAQ91108; 
XX 

DT 17-DEC-1995 (first entry) 
XX 

DE Hamster ileal/renal bile acid cotransporter . 
XX 

KW Ileal/renal bile acid cotransporter; therapeutic; gene therapy; 

KW diagnostic; ss. 

XX 

OS Cricetulus griseus. 
XX 

FH Key Location/Qualifiers 
FT CDS 109. .1152 

FT /*tag= a 

XX 

PN WO9517905-A1. 
XX 

PD 06-JUL-1995. 
XX 

PF 29-DEC-1994; 94WO-US0144 31 . 
XX 

PR 29-DEC-1993; 93US-00176126 . 
XX 

PA (UYWA-) UNIV WAKE FOREST. 
XX 

PI Dawson PA; 
XX 

DR WPI; 1995-246189/32. 
DR P-PSDB; AAR77224. 
XX 

PT Hamster and human ileal and bile acid transport DNA and protein - useful 

PT in treatment of e.g. hypercholesterolemia, diabetes and various 

PT digestive diseases, and in 1 gene therapy to restore bile acid uptake 

PT activity. 

XX 

PS Claim 4; Page 98-103; 148pp; English. 
XX 

CC The ileal/renal bile acid cotransporter cDNA is cloned in an expression 
CC vector (plasmid pCMX or plasmid pCMVS) under the control of a baculo 



CC virus Autographa californica nuclear-polyhedrosis virus gene promoter, 

CC the cytomegalo virus immediate early gene promoter, the SV4 0 virus late 

CC gene promoter or an inducible promoter e.g. the lactose operon promoter, 

CC and expressed in CHO, MDCK, CaCo2, BHK or preferably C0S-1A cells. The 

CC cotransporter is useful in the treatment of hypercholesterolemia, 

CC diabetes, heart disease, liver disease and various digestive disorders. 

CC The cDNA may by used in gene therapy to restore bile acid uptake activity 

CC to patients whose ileum has been surgically resected for diseases such as 

CC Crohn disease, patients born with congenital defects in the bile 

CC transporter, and patients suffering from adult-onset chronic idiopathic 

CC bile acid diarrhoea. The DNA and protein may be used in screening methods 

CC as modulators of ileal/renal bile acid cotransport activity. The DNA can 

CC also be used to detect mutations and RFLPs in human ileal/renal bile acid 

CC cotransporter genes by amplification with primers (see AAQ91110-15) 

XX 

SQ Sequence 2263 BP; 672 A; 451 C; 476 G; 664 T; 0 U; 0 Other; 

Query Match 28.3%; Score 320.4; DB 2; Length 2263; 

Best Local Similarity 60.8%; Pred. No. 1.3e-89; 

Matches 522; Conservative 0; Mismatches 336; Indels 0; Gaps 0; 



Qy 


8 0 ATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGT 

II I I 1 1 III II IN Ml 1 1 II 1 1 1 1 1 1 1 

188 ACGCCATCCTCAGCGTGGTGATGAGCACCGTGCTCACAATCCTCCTAGCCTTGGTGATGT 


139 


Db 


247 


Qy 


140 


TCTCTTTGG GAT GT T C C GT G GAGAT C C G GAAGCT GT GGT C GCAC AT C AGGAGAC C CT G G G 

III 1 1 1 1 II 1 1 1 II III 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

TTTCCATGGGGTGCAATGTGGAACTCCACAAGTTTCTGGGACACCTAAGGCGGCCATGGG 


199 


Db 


248 


307 


Qy 


200 


GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG 

1 1 1 1 1 1 II 1 1 1 1 II 1 1 1 1 1 1 II 1 1 1 1 1 M 1 M 1 1 

GCATCGTCGTGGGCTTCCTCTGTCAGTTTGGAATCATGCCTCTCACAGGTTTCGTCCTGT 


259 


Db 


308 


367 


Qy 


260 


CCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 

Ml M 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

CCGTGGCCTTTGGCATCCTCCCAGTGCAAGCTGTGGTGGTGCTGATCCAGGGTTGCTGCC 


319 


Db 


368 


427 


Qy 


320 


CGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGATGGAGATATGGATCTCAGCA 

| I I I I I 1 1 1 1 1 1 1 1 1 III 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

CT GGAGGAACTGCCT CCAATAT CCTAGCCTATT GGGTAGATGGCGACATGGACCTCAGC G 


379 


Db 


428 


487 


Qy 


380 


TCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATC 

I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

TTAGCATGACCACCTGCTCCACGCTGCTTGCCCTTGGAATGATGCCCCTTTGCCTCTTCA 


439 


Db 


488 


547 


Qy 


440 


T CTACACCT GGT CCT GGAGT CTT CAGCAGAAT CT CACCATT CCTT AT CAGAACATAGGAA 

M 1 1 1 1 1 1 Ml 1 II 1 1 1 1 1 1 1 II 1 1 1 1 1 1 II 1 

T CT AT AC CAAGAT GT GG GTT GACT C AGG GAC GATT GT G ATT C CTT AT GACAGCAT T GGC A 


499 


Db 


548 


607 


Qy 


500 


TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 

| I I I 1 1 II 1 1 1 1 1 1 II 1 1 1 1 II 1 1 1 1 1 1 1 I 1 1 1 1 1 1 I 1 1 1 

CTTCTCTGGTTGCTCTTGTTATTCCTGTTTCCATTGGAATGTATGTGAATCACAAATGGC 


559 


Db 


608 


667 


Qy 


560 


CAAAACAATCCAJ^AATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 

I 1 1 1 1 1 1 1 1 1 II 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

C C C AAAAAG C AAAGAT C AT ACT T AAAAT T GGAT C CAT C G C AGGT G C AAT T CT CAT T GT T C 


619 


Db 


668 


727 



Qy 620 TGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTC 679 

I I I I I III Ml I I I I I III I I I I I II I I I I 

Db 728 T C AT CGCT GT G GT T G GAGGAAT ACT GT ACCAAAGT GC CT G GAC CATT GAACC CAAGCT GT 787 

Qy 680 TGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 739 

II II I I I I I I I II I II I I I I I I I I I II I I I I I I 

Db 78 8 GGATTATAGGAACCATATATCCTATAGCTGGCTACGGCCTGGGGTTTTTCCTGGCTAGAA 847 

Qy. 740 TT AC C CACCAGT CTT GGC AAAGGT GCAGGACAATTT C CTT AGAAACT GGAGCT CAGAATA 7 99 

III II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 

Db 848 TTGCTGGTCAACCCTGGTACAGGTGCCGAACAGTTGCCTTGGAAACCGGGTTGCAGAACA 907 

Qy 8 00 TT C AGAT GT GCAT CACCAT GCT CCAGTT AT CTTTCACT GCT GAGCACTT GGT CCAGAT GT 859 

I I II I II I I I I I I I I I II I I I II I I I I I II I I I II III 
Db 908 CTCAGCTGTGTTCCACCATTGTGCAGCTTTCCTTCAGCCCTGAGGACCTCAACCTTGTGT 967 

Qy 8 60 TGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGATGGATTTCTTATTGTTGCAG 919 

II I I I I I I I III I I I I I I I I I I Ml I I I I I I 

Db 968 TCACCTTCCCCCTCATCTACAGCATCTTCCAGATCGCCTTTGCAGCAATACTATTAGGAG 1027 

Qy 920 CAT AT C AGAC GT AC AAGA 937 

I I I I I I I I I I I I 

Db 1028 CTTATGTCGCATACAAGA 1045 



RESULT 6 
AAQ91109 

ID AAQ91109 standard; cDNA; 1047 BP. 
XX 

AC AAQ91109; 
XX 

DT 17-DEC-1995 (first entry) 
XX 

DE Human ileal/renal bile acid cotransporter . 
XX 

KW Ileal/renal bile acid cotransporter; therapeutic; gene therapy; 

KW diagnostic; ss. 

XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualifiers 

FT CDS 1. .1044 

FT /*tag= a 
XX 

PN WO9517905-A1. 
XX 

PD 06-JUL-1995. 
XX 

PF 29-DEC-1994; 94WO-US014431 . 
XX 

PR 29-DEC-1993; 93US-00176126 . 
XX 

PA (UYWA-) UNIV WAKE FOREST. 
XX 

PI Dawson PA; 
XX 

DR WPI; 1995-246189/32. 



DR P-PSDB; AAR77225. 
XX 

PT Hamster and human ileal and bile acid transport DNA and protein - useful 

PT in treatment of e.g. hypercholesterolaemia, diabetes and various 

PT digestive diseases, and in gene therapy to restore bile acid uptake 

PT activity. 

XX 

PS Claim 5; Page 107-111; 148pp; English. 
XX 

CC The ileal/renal bile acid cotransporter cDNA is cloned in an expression 

CC vector (plasmid pCMX or plasmid pCMV5) under the control of a baculo 

CC virus Autographa californica nuclear-polyhedrosis virus gene promoter, 

CC the cytomegalo virus immediate early gene promoter, the SV4 0 virus late 

CC gene promoter or an inducible promoter e.g. the lactose operon promoter, 

CC and expressed in CHO, MDCK, CaCo2, BHK or preferably COS-1A cells. The 

CC cotransporter is useful in the treatment of hypercholesterolaemia, 

CC diabetes, heart disease, liver disease and various digestive disorders. 

CC The cDNA may by used in gene therapy to restore bile acid uptake activity 

CC to patients whose ileum has been surgically resected for diseases such as 

CC Crohn disease, patients born with congenital defects in the bile 

CC transporter, and patients suffering from adult-onset chronic idiopathic 

CC bile acid diarrhoea. The DNA and protein may be used in screening methods 

CC as modulators of ileal/renal bile acid cotransport activity. The DNA can 

CC also be used to detect mutations and RFLPs in human ileal/renal bile acid 

CC cotransporter genes by amplification with primers (see AAQ91110-15) 
XX 

SQ Sequence 1047 BP; 251 A; 251 C; 255 G; 290 T; 0 U; 0 Other; 

Query Match 26.3%; Score 297.8; DB 2; Length 1047; 

Best Local Similarity 58.5%; Pred. No. l.le-82; 

Matches 518; Conservative 0; Mismatches 367; Indels 0; Gaps 0; 

8 0 AT GGAAAC CT G GAG CT C GT T T T C AC AGT GGT GT C CACT GT GAT GAT GGGGCTGCT CAT GT 139 

| I MM III M I I I I M I II IN M I I I II 

80 AT AAC AT C CT AAGT GT G GT C C T AAGT AC G GT G CT G AC CAT CCTGTTGGCCTTGGT GAT GT 139 

140 T CT CTTT GGGAT GTT C CGT GGAGAT CCGGAAGCT GT GGT C GCACAT CAGGAGACC CTGGG 199 

I I I I I I I I I I I I I I I I I I I I Ml I I I I I I I I I I I I M M 

140 T CT CCAT GGGATGCAACGT GGAAAT CAAGAAATTT CT AGGGCACAT AAAGCGGCC GT GGG 199 

200 GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG 259 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I 

2 00 GCATTTGTGTTGGCTTCCTCTGTCAGTTTGGAATCATGCCCCTCACAGGATTCATCCTGT 259 

2 60 CCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 319 

| | MM I II I I I I I I I II I M II II I I II I 

2 60 CGGTGGCCTTTGACATCCTCCCGCTCCAGGCCGTAGTGGTGCTCATTATAGGATGCTGCC 319 

320 C GGGGGGCACCAT CT CT AACATTTT CAC CTT CTGGGTT GAT GGAGAT AT GGAT CT CAGCA 379 

| M M I I I I I I I I I M I I I I I II I II I II I I I I I I I I I M I 
320 CT G GAG GAACT GC CT C CAAT AT CT T GGC CT ATTG GGT CGAT GG C GAC AT G GAC CT GAGCG 37 9 

380 TCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATC 439 

MM II M Mill M Mill MIMI II III I I 

380 TCAGCATGACCACATGCTCCACACTGCTTGCCCTCGGAATGATGCCGCTGTGCCTCCTTA 439 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 



Qy 



440 T CT ACAC CT GGT C C T GGAGT CTT C AGC AGAAT CT CAC CAT T C CT T AT CAGAAC AT AGGAA 499 



1 1 1 1 II I III II II 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 I 

Db 440 T CT AT AC CAAAAT GT G G GT C GACT CT GGGAGCAT C GT AAT T CC CT AT GATAAC ATAG GTA 499 

Qy 500 TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 559 

I I I I I II I I I I I I I I I I I I I I I I I I I II I M I I I I I 

Db 500 CATCTCTGGTTGCTCTCGTTGTTCCTGTTTCCATTGGAATGTTTGTTAATCACAAATGGC 559 

Qy 560 CAAAACAATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 619 

I II II I II II I I I I I I I I I I I I I I I I I III I I I I I I I I I 

Db 560 C C C AAAAAG C AAAGAT CAT ACT T AAAAT T G G GT C CAT CGCGGGCGC CAT C CT CAT T GT GC 619 

Qy 62 0 TGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTC 679 

I I I I I I I I I I I I M III I I I I I I M M 

Db 620 T CAT AGCT GT GGT T GGAGGAAT AT T GT AC CAAAGCGC CT G GAT CAT T GCT CCC AAACT GT 67 9 

Qy 680 TGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 739 

II II I I I I I I I I I I I II I I I I I I I I I I I I I I I 

Db 680 GGATTATAGGAACT^TATTTCCTGTGGCGGGTTACTCCCTGGGGTTTCTTCTGGCTAGAA 739 

Qy 740 TTACCCACCAGT CTT GGCAAAGGTGCAGGACAATTT CCTTAGAAACTGGAGCT CAGAATA 799 

M I I I I I I I II I I II I II MINIMUM I I I I I I 

Db 74 0 TTGCTGGTCTACCCTGGTACAGGTGCCGAACGGTTGCTTTTGAAACGGGGATGCAGAACA 799 

Qy 800 TTCAGATGTGCATCACCATGCTCCAGTTATCTTTCACTGCTGAGCACTTGGTCCAGATGT 859 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 

Db 800 C GCAGCT AT GTT C CAC CAT C GT T C AG CT CT C CT T C ACT C CT GAGGAGCT CAAT GT C GT AT 859 

Qy 860 TGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGATGGATTTCTTATTGTTGCAG 919 

II I I I I I I I III I I II I I I I I I II I I III 

Db 860 TCACCTTCCCGCTCATCTACAGCATTTTCCAGCTCGCCTTTGCCGCAATATTCTTAGGAT 919 

Qy 92 0 CAT AT CAGACGTACAAGAGGAGATT GAAGAACAAACAT GGAAAAA 964 

I I I I I I I I I I I I I I I II I I I I I I 

Db 92 0 T T TAT GT GGCAT ACAAGAAAT GT CAT GGAAAAAACAAGG CAGAAA 964 



RESULT 7 
ACF63388 

ID ACF63388 standard; DNA; 3779 BP. 
XX 

AC ACF63388; 
XX 

DT 09-OCT-2003 (first entry) 
XX 

DE Human I BAT gene SEQ ID NO: 110. 

XX 

KW Human; pharmacological; hypotensive; antilipaemic; vasotropic; laxative; 

KW dermatological; antidepressant; tranquilliser; antiinflammatory; eczema; 

KW antiulcer; antimigraine; neuroprotective; antiparkinsonian; analgesic; 

KW gynaecological; virucide; vulnerary; antiarthritic; antipsoriatic; cold; 

KW antimicrobial; cytostatic; litholytic; pathological disorder; depression; 

KW abnormal appetite; hypertension; hypercholesterolaemia; hyperlipidaemia; 

KW erectile dysfunction; anxiety; stress; inflammatory bowel syndrome; 

KW ulcerative colitis; Crohn's disease; renal stone; gall stone; migraine; 

KW constipation; headache; seizure; multiple sclerosis; polymyositis; 

KW fibromyalgia; Parkinson's disease; amyotrophic lateral sclerosis; trauma; 

KW chronic pain; pre-menstrual syndrome; sinusitis; carpal tunnel syndrome; 



KW chronic fatigue syndrome; rosacea; arthritis; psoriasis; prostatis; 

KW inflammation; heart burn; infection; colon cancer; malignant melanoma; 

KW skin disorder; gene; ds . 
XX 

OS Homo sapiens. 
XX 

PN WO2003006478-A1. 
XX 

PD 23-JAN-2003. 
XX 

PF 10-JUL-2002; 2002WO-US021664 . 
XX 

PR 10-JUL-2001; 2001US-0303820P. 
XX 

PA (OLIG-) OLIGOS ETC INC. 
XX 

PI Dale RMK, Arrow A, Thompson T; 
XX 

DR WPI; 2003-221709/21. 
XX 

PT Composition with a modified oligonucleotide useful for treating a patient 

PT with a pathological disorder such as abnormal appetite, hypertension, 

PT eczema, anxiety, stress, and cancer. 
XX 

PS Claim 6; Page 130-132; 173pp; English. 
XX 

CC The present invention describes a composition (I) suitable for 

CC administration in a mammal, which comprises a modified oligonucleotide 

CC (II) of 7-75 nucleotides containing 7 or more contiguous ribose groups 

CC linked by achiral 5 f -3* internucleoside phosphate linkages, where the 

CC modified oligonucleotide is complementary to a region of a gene 

CC associated with a pathological disorder. Also described: (1) a 

CC nutritional supplement comprising (II) ; and (2) a cosmetic composition 

CC comprising (II), where the modified oligonucleotide is complementary to a 

CC region of a gene associated with a skin disorder. (I) and (II) can have 

CC hypotensive, antilipaemic, vasotropic, dermatological, antidepressant, 

CC tranquilliser, antiinflammatory, antiulcer, laxative, antimigraine, 

CC neuroprotective, antiparkinsonian, analgesic, gynaecological, virucide, 

CC vulnerary, antiarthritic, antipsoriatic, antimicrobial, cytostatic and 

CC litholytic activities. (I) can be used for treating a patient with a 

CC pathological disorder selected from abnormal appetite, hypertension, 

CC hypercholesterolaemia, hyperlipidaemia, erectile dysfunction, eczema, 

CC depression, anxiety, stress, inflammatory bowel syndrome, ulcerative 

CC colitis, Crohn's disease, renal stones, gall stones, constipation, colds, 

CC migraine headache, seizure, multiple sclerosis, polymyositis, sinusitis, 

CC fibromyalgia, Parkinson's disease, amyotrophic lateral sclerosis (ALS) , 

CC chronic pain, pre-mens trual syndrome, trauma, carpal tunnel syndrome, 

CC chronic fatigue syndrome, rosacea, arthritis, psoriasis, prostatis, 

CC inflammation, heart burn, infection, poison ivy, colon cancer, malignant 

CC melanoma, and malignant nasal polyps. The nutritional supplement is 

CC useful for supplementing the diet of an individual, and the cosmetic 

CC composition is useful for improving the appearance of the skin in an 

CC individual with a skin disorder. ACF63279 to ACF63410 represent 

CC nucleotide sequence given in the exemplification of the present invention 

XX 

SQ Sequence 3779 BP; 1117 A; 737 C; 799 G; 1126 T; 0 U; 0 Other; 



Query Match 26.3%; Score 297.8; DB 7; Length 3779; 

Best Local Similarity 58.5%; Pred. No. 2.3e-82; 

Matches 518; Conservative 0; Mismatches 367; Indels 0; Gaps 0; 

Qy 80 AT GGAAAC CT G GAGCT C GT T T T CACAGT GGT GT C C ACT GT GAT GAT GGGGCT GCT CAT GT 139 

II | M I I I I I I I I I I I I I I I I I I II I I I II 

Db 67 8 AT AAC AT C C T AAGT GT G GT C CT AAGT AC GGT GCT G AC CAT CCTGTTGGCCTTGGT GAT GT 737 

Qy 140 T CT CT T T GGGAT GT T C C GT G GAGAT C C GGAAGCT GT G GT C GC ACAT C AGGAGAC C CT GG G 199 

I I I I I I I I I I I I I I I I I I II III I I I I I I I I I I I I I I I I 

Db 7 38 TCTCCATGGGATGCAACGTGGAAATCAAGAAATTTCTAGGGCACATAAAGCGGCCGTGGG 7 97 

Qy 2 00 GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG 259 

Mill I I I I I I I I I I I I I I I I I I I I I I I I I I I MM I II I I I 

Db 7 98 GCATTTGTGTTGGCTTCCTCTGTCAGTTTGGAATCATGCCCCTCACAGGATTCATCCTGT 857 

Qy 2 60 CCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 319 

I I I I M I II M I I I I II I I II I I I II I I I II II I I 

Db 858 CGGTGGCCTTTGACATCCTCCCGCTCCAGGCCGTAGTGGTGCTCATTATAGGATGCTGCC 917 

Qy 320 CGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGATGGAGATATGGATCTCAGCA 37 9 

I || M I I I I I I I I I II Ml I I I I I I M I I M I M II II M I 

Db 918 CTGGAGGAACTGCCTCCAATATCTTGGCCTATTGGGTCGATGGCGACATGGACCTGAGCG 977 

Qy 380 TCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATC 439 

II I I I I II I I I II I I II I II II I I I M I I II I I I . 

Db 978 TCAGCATGACCACATGCTCCACACTGCTTGCCCTCGGAATGATGCCGCTGTGCCTCCTTA 1037 

Qy 44 0 T CT ACACCT GGT CCT GGAGT CTT CAGCAGAAT CT CACCATT C CTTAT CAGAACAT AGGAA 499 

|| || M I III M M I II II II I I M I I I II I I 

Db 1038 TCTATACCAAAAT GT GGGT C GACT CTGGGAGCAT CGTAATT CCCTAT GATAACATAGGTA 1097 

Qy 500 TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 559 

Mill II I I II I M I I I I II I I II I II I II I II I II 

Db 1098 CATCTCTGGTTGCTCTCGTTGTTCCTGTTTCCATTGGAATGTTTGTTAATCACAAATGGC 1157 

Qy 560 CAAAACAATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 619 

I I I II I I I II I I I I I I I I M I II II I I Ml I I M I M I I 
Db 1158 CCCAAAAAGCAAAGATCATACTTA7WVTTGGGTCCATCGCGGGCGCCATCCTCATTGTGC 1217 

Qy 620 TGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTC 67 9 

I I II I I I II I I I II I I I I I II I I M II 

Db 1218 TCATAGCTGT GGTTGGAGGAATATT GTAC CAAAGCGCCT GGAT CATT GCT CCCAAACTGT 1277 

Qy 68 0 TGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 739 

II II I I I I I I II I II II I I I II I I I M I II I I 

Db 127 8 GGATTATAGGAACAATATTTCCTGTGGCGGGTTACTCCCTGGGGTTTCTTCTGGCTAGAA 1337 

Qy 74 0 TT ACCCACCAGT CTT GGCAAAGGT GCAGGACAATTT CCTT AGAAACT GGAGCT CAGAAT A 799 

III I I I II I II I I II I II II I II II I II II II I I I I 

Db 1338 TTGCTGGTCTACCCTGGTACAGGTGCCGAACGGTTGCTTTTGAAACGGGGATGCAGAACA 1397 

Qy 800 TTCAGATGTGCATCACCATGCTCCAGTTATCTTTCACTGCTGAGCACTTGGTCCAGATGT 859 

II I I M I II I II I I M I I I I II I II II II I I I II 

Db 1398 CGCAGCTATGTTCCACCATCGTTCAGCTCTCCTTCACTCCTGAGGAGCTCAATGTCGTAT 1457 



Qy 



860 TGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGATGGATTTCTTATTGTTGCAG 919 



II 1 1 1 1 1 1 1 III I I 1 1 1 1 1 1 1 1 II I I III 

Db 1458 TCACCTTCCCGCTCATCTACAGCATTTTCCAGCTCGCCTTTGCCGCAATATTCTTAGGAT 1517 

Qy 920 CAT AT C AG AC GT AC AAGAG G AGAT T GAAGAAC AAAC AT G GAAAAA 9 64 

I I I I I I I I I I I I I I I I I I 

Db 1518 TTTATGT GGCATACAAGAAAT GT CAT GGAAAAAACAAGGCAGAAA 1562 



RESULT 8 
ABZ20750 



ID ABZ20750 standard; DNA; 3779 BP. 
XX 

AC ABZ20750; 
XX 

DT 28-MAR-2003 (first entry) 
XX 

DE Human ileal sodium-dependent bile acid transporter gene fragment #1. 
XX 

KW Human; ileal sodium-dependent bile acid transporter gene; SLC10A2; SNP; 

KW single nucleotide polymorphism; chromosome 13q33; cardiant; 

KW antiarteriosclerotic; antilipemic; gene; ds . 
XX 

OS Homo sapiens . 
XX 

FH Key Location/Qualifiers 

FT variation replace ( 582 , G) 

FT /*tag= a 

FT variation replace (664 , C) 

FT /*tag= b 

FT variation replace ( 727 , T) 

FT /*tag= c 

FT variation replace ( 7 92 , T ) 

FT /*tag= d 

FT variation replace ( 890, A) 

FT /*tag= e 

FT variation replace ( 1073, A) 

FT /*tag= f 

FT variation replace ( 1103, T) 

FT /*tag= g 

FT variation replace ( 1384 , T ) 

FT /*tag= h 

FT variation replace ( 1466, T) 

FT /*tag= i 

FT variation replace ( 1484 , C) 

FT /*tag= j 

FT variation replace ( 1545, A) 

FT /*tag= k 

FT variation replace ( 1646, T) 

FT /*tag= 1 

FT variation replace ( 1683, C) 

FT /*tag= m 

FT variation replace ( 1765, C) 

FT /*tag= n 

XX 

PN WO200283944-A2. 
XX 

PD 24-OCT-2002. 



XX 

PF ll-APR-2002; 2002WO-GB001681 . 
XX 

PR 17-APR-2001; 2001GB-00009296 . 

PR 19-APR-2001; 2001US-0284530P . 
XX 

PA (ASTR ) ASTRAZ ENECA AB. 

PA (ASTR ) ASTRAZ ENECA UK LTD. 

XX 

PI Morten JEN; 
XX 

DR WPI; 2003-046927/04. 
XX 

PT Diagnosing polymorphism in SLC10A2 in a human for assessing the 

PT pharmacogenetics of a drug for treating cardiovascular and hyperlipidemic 

PT conditions, by determining the status of the human by reference to 

PT polymorphism in SLC10A2. 

XX 

PS Claim 4; Page 19-20; 21pp; English. 
XX 

CC The present invention relates to a method of diagnosing polymorphisms in 

CC SLC10A2 (human ileal sodium-dependent bile acid transporter gene) in a 

CC human, which involves determining the status of the human by reference to 

CC polymorphisms in SLC10A2. The method is useful for assessing the 

CC pharmacogenetics of a drug acting at SLC10A2. The SLC10A2 gene 

CC polymorphism is useful as a genetic marker in a linkage study. SLC10A2 

CC drugs are also useful for treating cardiovascular (e.g. atherosclerosis) 

CC and hyperlipidemic conditions. The SLC10A2 gene is found at chromosome 

CC 13q33. The present sequence is a fragment of the gene of the invention 

CC containing polymorphisms 

XX 

SQ Sequence 3779 BP; 1117 A; 737 C; 799 G; 1126 T; 0 U; 0 Others- 
Query Match 26.3%; Score 297.8; DB 7; Length 3779; 
Best Local Similarity 58.5%; Pred. No. 2.3e-82; 

Matches 518; Conservative 0; Mismatches 367; Indels 0; Gaps 0; 

Qy 80 ATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGT 139 

|| | | I I Mill I I I I I I I I I I I I I I I I I I I 

Db 67 8 ATAACATCCTAAGTGTGGTCCTAAGTACGGTGCTGACCATCCTGTTGGCCTTGGTGATGT 737 



Qy 14 0 TCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGG 199 

I I I I II I II I I I I I I I I I I I Ml I I I I I II I I I I I I I I I 

Db 738 T CT C CAT GGGAT GCAAC GT GGAAAT CAAGAAATT T CT AGG GC AC AT AAAGC GGC C GTG GG 7 97 

Qy 200 GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG 259 

I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I M I I 

Db 798 GCATTTGTGTTGGCTTCCTCTGTCAGTTTGGAATCATGCCCCTCACAGGATTCATCCTGT 857 

Qy 2 60 CCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 319 

| | | | I I I II I I I I I I II I I I I I I I I I I I I I I I I I I 

Db 858 CGGTGGCCTTTGACATCCTCCCGCTCCAGGCCGTAGTGGTGCTCATTATAGGATGCTGCC 917 



Qy 320 CGGGGGGCACCAT CTCTAACATTTTCACCTT CT GGGTT GAT GGAGATATGGAT CTCAGCA 379 

I M I I II I I I I I II II III I I I I I I I I I I I I I I I I I I I I M 
Db 918 CTGGAGGAACTGCCTCCAATATCTTGGCCTATTGGGTCGATGGCGACATGGACCTGAGCG 977 



Qy 380 TCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATC 439 

I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I - I I I I I I I 

Db 978 TCAGCATGACCACATGCTCCACACTGCTTGCCCTCGGAATGATGCCGCTGTGCCTCCTTA 1037 

g y 44 0 T CT AC AC CT GGT C CT GGAGT CT T CAG CAGAAT CT CAC CAT T C CT T AT C AGAAC AT AGGAA 499 

M | I I I I III II II I I M I I I I I I 

Db 1038 T C TAT AC C AAAAT GT G GGT C G AC T CT G G GAG CAT C GT AAT T C C CT AT G AT AAC AT AG GT A 1097 

Qy 500 TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 559 

| | | | | II I I I I I I I I I I I I I I I II I I I I I I I I I I I I 

Db 1098 CATCTCTGGTTGCTCTCGTTGTTCCTGTTTCCATTGGAATGTTTGTTAATCACAAATGGC 1157 

Qy 560 C7VAAACAATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 619 

I | | I I I I I I I I I I I I I I I I I I I I I I I I Ml I I I I I I t I I 

D b 1158 CCCAAAAAGCAAAGATCATACTTAAAATTGGGTCCATCGCGGGCGCCATCCTCATTGTGC 1217 

Qy 62 0 T GGT C GC AGT TGCTGGTGTGGTCCTGGC GAAAGGAT CT T GG AAT T CAGAC AT CAC C CT T C 679 

I I I I I I I I I I I I I I I I II II 

Db 1218 T CATAGCT GT GGT T GGAGGAAT AT T GT AC CAAAGC GC CT GGAT CAT T GCT C C CAAACT GT 1277 

Q y 680 TGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 739 

II II I I I I I I I I I I I I I I I I I I I I I I 

Db 127 8 GGATTATAGGAACAATATTTCCTGTGGCGGGTTACTCCCTGGGGTTTCTTCTGGCTAGAA 1337 

Qy 7 40 TT ACCCACCAGT CTT GGCAAAGGTGCAGGACAAT TT CCTTAGAAACT GGAGCT CAGAAT A 7 99 

| | I | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1338 TTGCTGGTCTACCCTGGTACAGGTGCCGAACGGTTGCTTTTGAAACGGGGATGCAGAACA 1397 

Qy 800 TT CAGAT GT GCAT CAC CAT GCT CCAGTT AT CTTT CACT GCT GAGCACTTGGTCCAGAT GT 859 

I I I I II MINI I I I I I. I I I I I I I I I I I I I I I II 
Db 1398 C G C AG CT AT GT T C CAC CAT C GT T C AGCT CT C CTT CACT C CT GAGGAGCT C AAT GT C GT AT 1457 

Qy 8 60 T GAG T T T C C CAC T G G C C T AT G G AC TCTTCCAGCT GAT AG AT G GAT T T C T T AT T G T T G CAG 919 

I I Mill II I I I I I I I I I I I I I II I I I I I 

Db 1458 TCACCTTCCCGCTCATCTACAGCATTTTCCAGCTCGCCTTTGCCGCAATATTCTTAGGAT 1517 

Qy 920 CAT AT CAGAC GT AC AAGAGGAGAT T GAAGAAC AAAC AT GGAAAAA 964 

I II I I I I I I I I I I II II Ml IN. 

Db 1518 T T TAT GT G GCAT AC AAGAAAT GT CAT G GAAAAAAC AAGGC AGAAA 1562 



RESULT 9 
ADB58285 

ID ADB58285 standard; DNA; 4269 BP. 
XX 

AC ADB58285; 
XX 

DT 04-DEC-2003 (first entry) 
XX 

DE Toxicity-related gene, SEQ ID 3311. 
XX 

KW Toxic; toxin; gene expression profile; hepatotoxicity ; liver; 

KW drug screening; toxicity assay; ds . 

XX 

OS Unidentified. 
XX 

PN WO2003064624-A2. 



XX 

PD 07-AUG-2003. 
XX 

PF 31-JAN-2003; 2003WO-US003194 . 
XX 

PR 31-JAN-2002; 2 002US-00060087 . 



PR 15-MAR-2002; 2 002US-0364045P . 

PR 15-MAR-2002; 2002US-0364055P . 

PR 30-DEC-2002; 2002US-0436643P . 
XX 

PA (GENE-) GENE LOGIC INC. 
XX 

PI Mendrick D, Porter M, Johnson K, Higgs B, Castle A, Elashoff M; 
XX 

DR WPI; 2003-689530/65. 
XX 

PT Predicting a toxic effect of a compound, useful in identifying toxicity 

PT markers in liver tissues or cells for drug screening and toxicity assays, 

PT comprises preparing gene expression profile of tissue or cells exposed to 

PT the compound. 
XX 

PS Claim 1; SEQ ID NO 3311; 1156pp; English. 
XX 

CC The present invention relates to a method for predicting a toxic effect 

CC of a compound. The method comprises preparing a gene expression profile 

CC of a tissue or cell sample exposed to the compound, and comparing the 

CC gene expression profile to a database comprising SEQ ID 1-4925, where 

CC differential expression of the gene indicates at least one toxic effect. 

CC The method is useful for predicting at least one toxic effect of a 

CC compound, predicting hepatotoxicity or the progression of a toxic effect 

CC of a compound, identifying an agent that modulates the onset or 

CC progression of a toxic response, predicting the cellular pathways that a 

CC compound modulates in a cell, and identifying an agent that modulates at 

CC least one activity of a protein. The method and compositions of the 

CC present invention using a database of genes having liver toxin-induced 

CC differential expression, are useful in identifying toxicity markers in 

CC liver tissues or cells for drug screening and toxicity assays. Note: The 

CC sequence data for this patent did not form part of the printed 

CC specification, but was obtained in electronic format directly from WIPO 

CC at ftp . wipo . int/ pub/published_pct_jsequences . 

XX 

SQ Sequence 4269 BP; 1315 A; 780 C; 850 G; 1324 T; 0 U; 0 Other; 



Query Match 25.9%; Score 293.2; DB 9; Length 4269; 

Best Local Similarity 59.1%; Pred. No. 7.1e-81; 

Matches 502; Conservative 0; Mismatches 348; Indels 0; Gaps 0; 

Qy 109 GTGTCCACTGTGATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGG 168 

I 11 III I I II II I I I I ! I I I I I I II I I I I I I I I I I 

Db 216 GTGCTCACCATTCTTCTAGCCATGGTGATGTTTTCTATGGGGTGCAATGTGGAAATCAAC 275 

Qy 169 AAGCTGTGGTCGCACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTT 228 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 276 AAGTTCCTAGGACACATAAAGCGGCCATGGGGCATCTTCGTGGGCTTCCTCTGTCAGTTT 335 

Qy 229 GGGCTCATGCCTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAA 288 

II I I I I I I I I I I I I I I I I I II I I I III I I I I I I I 



Db 336 GGAATCATGCCTCTCACAGGATTTATCCTGTCTGTGGCCTCTGGCATCCTTCCTGTGCAG 395 

Qy 2 89 GCTATTGCTGTTCTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACC 348 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I II 

Db 396 GCTGTGGTGGTGCTAATTATGGGTTGCTGCCCTGGAGGAACTGGCTCCAATATCCTGGCC 455 

Qy 349 TTCTGGGTTGATGGAGATATGGATCTCAGCATCAGTATGACAACCTGTTCCACCGTGGCC 4 08 

I I M I I I M I I I I I I I I II I I I I I I I I II I I I I I I I I I I II 

Db 4 56 TAT T GGATAGAT GGT GAC AT G GAC CT CAGT GT T AGC AT GAC CACT T GCT CC ACACT GCT T 515 

Qy 4 09 GCCCTGGGAATGATGCCACTCTGCATTTATCTCTACACCTGGTCCTGGAGTCTTCAGCAG 468 

I I I I I I I I I I I I I II I I II I III I 

Db 516 GCT CTT GGAAT GAT GCC CCTT T G C CT CT T CAT CT AT AC CAAGAT GT GGGT T GACT CAGGA 575 

Qy 4 69 AAT CT CACCATT CCTTAT CAGAACATAGGAATTACCCTT GT GT GCCT GACCATT CCT GT G 52 8 

I I I I II I I I I III II III I II II II I I I I II I I 

Db 576 ACGATTGTGATCCCCTACGATAGCATTGGCATTTCTCTGGTTGCGCTTGTTATTCCTGTT 635 

Qy 52 9 GCCTTTGGTGT CT AT GT GAAT T AC AGAT GGC CAAAACAAT C CAAAAT CAT T CT CAAGAT T 58 8 

I I I I I I | I I I I I I I I I I I I I I I I II II I I I I I I I I I M I I I 

Db 636 TCCATTGGAATGTTTGTAAATCACAAATGGCCCCAAAAAGCGAAGATTATACTTAAAATT 695 

Qy 589 GGGGCCGTTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCG 648 

II Mil MM I I I I II I I I I I I I I I I II I I I I I 

Db 696 GGATCCATCGCAGGTGCAATTCTCATTGTGCTCATAGCTGTGGTTGGAGGAATACTGTAC 755 

Qy 649 AAAGGAT CTT GGAAT T C AGAC AT C AC C CT T CT GAC CAT CAGT T T CAT CTTTCCTTT GAT T 708 

M | I I I I I II M M M M I M I M I I I I I 

Db 756 CAAAGT GCCTGGAT CAT TGAACCCAAACTAT GGATT ATAGGAACAATATTT CCTATAGCT 815 

Qy 709 GGCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGG 768 

II I I I I II II I I I II I II I I I I Ml 

Db 816 GGCTACAGCCTTGGTTTCTTCCTGGCTAGACTAGCTGGTCAACCCTGGTACAGGTGCCGA 875 

Qy 7 69 AC AAT T T C CT T AGAAACT GGAGCT CAGAAT AT T C AGAT GT GC AT C ACC AT GCT C CAGT T A 828 

M I II I I II I II I I II I I M I I I I I I I II I II III I 

Db 87 6 AC AGTT GCCT T G GAAACT GGAAT GCAGAAC ACT CAACT GT GTT C CAC CAT T GT ACAACT C 935 

Qy 82 9 T CT T T CACT GCT GAGC ACT T GGT C CAGAT GT T GAGT T T C C CAC T GGCCT AT G GACT CT T C 888 

M M I II I II I I M I I I I I II I I I II I I II I I I M 

Db 936 TCCTTTAGCCCTGAGGATCTCAACCTTGTGTTCACCTTCCCACTCATCTATACTGTTTTC 995 

Qy 8 8 9 CAGCT GAT AGAT GGAT TT CT T AT T GT T GC AGCAT AT CAGACGT ACAAGAG GAGATT GAAG 94 8 

I I M I I III III III Ml I I II I II I I I 

Db 996 CAGCT CGT CTTT GCAGCAAT AAT ATTAGGAAT GTAT GT CACAT ACAAGAAATGT CAT GGA 1055 



Qy 94 9 AACAAACATG 958 

I I I I II I 
Db 1056 AAAAATGATG 1065 



RESULT 10 
ADB52825 

ID ADB52825 standard; DNA; 4269 BP. 
XX 

AC ADB52825; 
XX 



DT 04-DEC-2003 (first entry) 
XX 

DE Primary rat hepatocyte toxicity modelling related gene SEQ ID NO: 3367. 
XX 

KW toxic effect; gene expression profile; hepatotoxicity; diagnostic marker; 

KW toxicity marker; toxicity progression; drug screening; 

KW primary rat hepatocyte toxicity modelling; gene; ds . 
XX 

OS Rattus norvegicus. 
XX 

PN WO2003065993-A2. 
XX 



PD 


14- 


-AUG- 


2003. 




XX 












PF 


04 


-FEB- 


2003; 2003WO- 


US003482. 


XX 












PR 


04 


-FEB- 


2002, 


2002US- 


0353171P. 


PR 


13 


-MAR- 


2002, 


2002US- 


0363534P. 


PR 


08 


-APR- 


2002, 


2002US- 


0370248P. 


PR 


10 


-APR- 


2002, 


2002US- 


0371134P. 


PR 


10 


-APR- 


2002, 


• 2002US- 


0371135P. 


PR 


10 


-APR- 


2002, 


• 2002US- 


0371150P. 


PR 


11 


-APR- 


2002, 


; 2002US- 


0371413P. 


PR 


19 


-APR- 


2002, 


; 2002US- 


0373601P. 


PR 


19 


-APR- 


2002, 


f 2002US- 


0373602P. 


PR 


22 


-APR- 


2002, 


• 2002US- 


0374139P. 


PR 


08 


-MAY- 


2002, 


; 2002US- 


0378370P. 


PR 


09 


-MAY- 


2002, 


\ 2002US- 


0378652P. 


PR 


09 


-MAY- 


-2002 


; 2002US- 


0378653P. 


PR 


09 


-MAY- 


•2002 


; 2002US- 


0378665P. 


PR 


09 


-JUL- 


■2002 


f 2002US- 


0394230P. 


PR 


09 


-JUL- 


-2002 


; 2002US- 


0394253P. 


PR 


04 


-SEP- 


■2002 


; 2002US- 


0407688P. 


PR 


28 


- JAN- 


-2003 


; 2003US- 


0442900P. 



XX 

PA (GENE-) GENE LOGIC INC. 
XX 

PI Mendrick D, Porter M, Johnson K, Higgs B, Castle A, Orr M; 

PI Elashoff M; 

XX 

DR WPI; 2003-731472/69. 
XX 

PT Determining if a compound induces a toxic effect on a tissue or cell, for 

PT identifying hepatotoxic compounds, comprises comparing a gene expression 

PT profile of a tissue or cell sample to a database of Tox mean and non-Tox 

PT mean values . 
XX 

PS Claim 44; SEQ ID NO 3367; 874pp; English. 
XX 

CC The present invention describes a method for determining whether a 

CC compound induces a toxic effect on a tissue or cell. The method comprises 

CC preparing a gene expression profile of a tissue or cell sample exposed to 

CC the compound, and comparing the gene expression profile to a database 

CC comprising data or information on the Tox mean and non-Tox mean value. 

CC The method is useful for predicting or identifying at least one toxic 

CC effect, particularly hepatotoxicity, of a test or unknown compound. The 

CC genes listed in the specification are useful as diagnostic or toxicity 



CC markers for the prediction or identification of the physiological state 
CC of tissue or cell sample that has been exposed to a compound, or to 
CC identify or predict the toxic effects of a compound or an agent. These 
CC may also be used as markers for monitoring toxicity progression or for 
CC drug screening. The present sequence represents a primary rat hepatocyte 
CC toxicity modelling related gene sequence from the present invention. 
XX 

SQ Sequence 4269 BP; 1315 A; 780 C; 850 G; 1324 T; 0 U; 0 Others- 
Query Match 25.9%; Score 293.2; DB 9; Length 4269; 
Best Local Similarity 59.1%; Pred. No. 7.1e-81; 

Matches 502; Conservative 0; Mismatches 34 8; Indels 0; Gaps 0; 

Qy 109 GTGTCCACTGTGATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGG 168 

Ml Ml I I II I I I I I I I I I I I I I I I I I I II I I I I I 

Db 216 GTGCTCACCATTCTTCTAGCCATGGTGATGTTTTCTATGGGGTGCAATGTGGAAATCAAC 275 

Qy 169 AAGCTGTGGTCGCACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTT 228 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 27 6 AAGTTCCTAGGACACATAAAGCGGCCATGGGGCATCTTCGTGGGCTTCCTCTGTCAGTTT 335 

Qy 22 9 GGGCTCATGCCTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAA 28 8 

II I I I I I I I I I I I I I I I I I I I I I I Ml I II I I I I 

Db 336 GGAATCATGCCTCTCACAGGATTTATCCTGTCTGTGGCCTCTGGCATCCTTCCTGTGCAG 395 

Qy 28 9 GCTATTGCTGTTCTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACC 34 8 

I I I I I I I I I II I I I II I I I I II I! I I I I I I 111 II II I M 

Db 396 GCTGTGGTGGTGCTAATTATGGGTTGCTGCCCTGGAGGAACTGGCTCCAATATCCTGGCC 455 

Qy 34 9 TT CTGGGTT GAT GGAGATAT GGATCT CAGCAT CAGTATGACAACCT GTT CCACCGT GGCC 408 

I Ml I I I I I I II I I I I I I I M I I II I I I I I I I I I I I I I I II 
Db 456 TATT GGATAGAT GGT GACATGGACCT CAGTGTTAGCATGACC ACTT GCT CCACACT GCTT 515 

Qy 4 09 GCCCTGGGAATGATGCCACTCTGCATTTATCTCTACACCTGGTCCTGGAGTCTTCAGCAG 468 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I 

Db 516 GCTCTTGGAATGATGCCCCTTTGCCTCTTCATCTATACCAAGATGTGGGTTGACTCAGGA 575 

Qy 4 69 AATCTCACCATTCCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTG 52 8 

I I I I I I II I I I I I I I I I I I I I II II MINIM 

Db 57 6 ACGATTGTGATCCCCTACGATAGCATTGGCATTTCTCTGGTTGCGCTTGTTATTCCTGTT 635 

Qy 529 GCCTTTGGTGT CTAT GT GAATTACAGAT GGCCAAAACAAT CCAAAATCATT CT CAAGATT 58 8 

I I I II I I I I M M I II I II II II II I I I I I II I I I I M I II 
Db 636 T CCATTGGAAT GTT T GTAAAT CACAAAT GGCCCCAAAAAGCGAAGATT ATACTTAAAATT 695 

Qy 589 GGGGCCGTTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCG 648 

II I I I I I I I I I I II M M I I II II I I II I I Ml 

Db 696 GGAT C CAT CGCAGGT GCAATT CT CAT T GT GCT CAT AGCT GT GGTT GGAGGAAT ACT GT AC 755 

Qy 649 AAAGGATCTTGGAATTCAGACATCACCCTTCTGACCATCAGTTTCATCTTTCCTTTGATT 7 08 

Ml I I II I M II II M II I I I II II M I I 

Db 756 CAAAGT G C CT GGAT CAT T GAACC CAAACTAT G GAT T ATAGGAACAAT AT TT C CT AT AGCT 815 

Qy 7 09 GGCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGG 768 

MM I Mill I M II I I I II I M I I M M I I I 

Db 816 GGCTACAGCCTTGGTTTCTTCCTGGCTAGACTAGCTGGTCAACCCTGGTACAGGTGCCGA 875 



Qy 769 AC AAT T T C CT T AG AAAC T G GAGCT C AGAAT AT T C AGAT GT GC AT C AC CAT GC T C C AGT T A 828 

I I I I I I II I I I I I I I I M MINIMI II II I I II I I IN I 

Db 876 ACAGT T GC CTT GGAAACT G GAAT GC AGAACACT C AACT GT GTT CC AC CAT T GT ACAACT C 935 

Qy 829 TCTTTCACTGCTGAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTC 888 

I I I I I II I M I I II I I I I I I I II I M I MM MM 

Db 936 TCCTTTAGCCCTGAGGATCTCAACCTTGTGTTCACCTTCCCACTCATCTATACTGTTTTC 995 

Qy 88 9 C AGCT GAT AGAT GGAT T T C T TAT T GT T GC AGC AT AT C AGAC GT ACAAGAGGAGAT T G AAG 94 8 

Mill I III III I I I II I II I II I II I I 

Db 996 CAGCT C GT CTTT GCAGCAAT AAT AT TAG GAAT GT AT GT C AC AT ACAAGAAAT GT CAT GGA 1055 

Qy 94 9 AAC AAAC AT G 958 

I I I I Ml 
Db 1056 AAAAAT GAT G 1065 



RESULT 11 




ABK63719 




ID 


ABK63719 standard; cDNA; 1663 BP. 


XX 






AC 


ABK63719; 




XX 






DT 


18-JUN-2002 


(first entry) 


XX 






DE 


Rat sequence 


differentially expressed in response to a hepatotoxin #1626. 


XX 






KW 


Rat; ss; hepatotoxin; expressed sequence tag; EST; drug screening; 


KW 


differential 


expression; centrilobular necrosis; steatosis. 


XX 






OS 


Rattus norvegicus. 


XX 






FN 


WO200210453-A2. 


XX 






PD 


07-FEB-2002. 




XX 






PF 


30-JUL-2001; 


2001WO-US023872. 


XX 






PR 


31-JUL-2000; 


2000US-0222040P. 


PR 


02-NOV-2000; 


2000US-0244880P. 


PR 


ll-MAY-2001; 


2001US-0290029P. 


PR 


15-MAY-2001; 


2001US-0290645P. 


PR 


22-MAY-2001; 


2001US-0292336P. 


PR 


06-JUN-2001; 


2001US-0295798P. 


PR 


13-JUN-2001; 


2001US-0297457P. 


PR 


19-JUN-2001; 


2001US-0298884P. 


PR 


09-JUL-2001; 


2001US-0303459P. 


XX 






PA 


(GENE- ) GENE 


LOGIC INC. 


XX 






PI 


Mendrick D, 


Porter MW, Johnson KR, Castle AL, Elashoff MR; 


XX 






DR 


WPI; 2002-241625/29. 


XX 






PT 


Predicting toxic effects of compounds or the progression of these toxic 


PT 


effects by determining the changes in gene expression in tissues or cell: 


PT 


exposed to the toxin and comparing these to gene expression in unexposed 



PT tissues or cells. 
XX 

PS Claim 1; SEQ ID NO 1626; 239pp; English. 
XX 

CC The invention relates to methods for predicting toxic effects of 

CC compounds or the progression of these toxic effects by determining the 

CC global changes in gene expression in tissues or cells exposed to the 

CC toxin and comparing these to gene expression in unexposed tissues or 

CC cells. Also included are methods of predicting at least one toxic effect 

CC of a compound or progression of a toxic effect, preferably the 

CC hepatotoxicity of a compound, comprising detecting the level of 

CC expression in a tissue or cell sample exposed to the compound of two or 

CC more genes listed in the specification, where differential expression of 

CC the genes is indicative of at least one toxic effect or progression. The 

CC method can also be used to identify an agent which modulates the toxic 

CC response and predict cellular pathways that a compound modulates in a 

CC cell. The methods utilise a set of at least two probes (on a solid 

CC support in kit form) , where each of the probes comprises a sequence that 

CC specifically hybridises to a gene listed in the specification, a computer 

CC system comprising a database containing information identifying the 

CC expression level in a tissue or cell sample exposed to a hepatotoxin of a 

CC set of genes comprising at least two genes listed in the specification, 

CC and a user interface to view the information used to present information 

CC identifying the expression level in a tissue or cell of at least one gene 

CC listed in the specification. The method is useful for elucidating global 

CC changes in gene expression and for identifying toxicity markers in 

CC tissues or cell exposed to a known toxin. The genes may be used as 

CC toxicity markers in drug screening and toxicity assays. The genes and 

CC gene expression information may be used as diagnostic markers for the 

CC prediction or identification of the physiological state of tissue or cell 

CC sample that has been exposed to a compound or agent. Hepatotoxicity is 

CC characterised by centrilobular necrosis and steatosis. The present 

CC sequence is an expressed sequence tag (EST) or cDNA derived from a gene 

CC which is differentially expressed in response to a hepatotoxic agent 

XX 

SQ Sequence 1663 BP; 450 A; 460 C; 325 G; 428 T; 0 U; 0 Other; 



Query Match 16.2%; Score 183.2; DB 6; Length 1663; 

Best Local Similarity 53.6%; Pred. No. 1.9e-46; 

Matches 430; Conservative 0; Mismatches 363; Indels 9; Gaps 2; 

Qy 119 TGATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGT 17 8 

I I I I I I Ml I I I I I I I I I I II I I I MINI I I I I I I I I 
Db 219 T AAT GT T G CT G CTT AT CAT GCT CTCACT GGGCT GCACCAT G GAATT C AGCAAGAT CAAG G 27 8 

Qy 17 9 C GC AC AT C AGGAG AC C CT G GG G CAT T G CT GT G GGACT GCT CT GC C AGT T T GG G CT CAT G C 238 

I I I I I III III II I III II I I I II I I I I I I II I I I 

Db 279 CTCACTTGTGGAAGCCCAAAGGGGTGATCGTTGCCTTGGTGGCCCAGTTTGGCATCATGC 338 

Qy 239 CTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTG 298 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 339 CCCTCGCTGCTTTTCTTCTCGGCAAGATCTTTCACCTGAGCAACATTGAAGCTCTGGCCA 398 

Qy 2 99 TTCTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTG 358 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 399 TCCTCATCTGTGGCTGCTCTCCCGGGGGGAACTTGTCCAACCTCTTCACCCTGGCCATGA 458 



Qy 359 ATGGAGATATGGATCTCAGCATCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAA 418 

I M || Ml I I I I I I I I I I II I I I I I I I I I I I I I i Ml I I I I I 

Db 4 59 AG G G G GACAT GAAC CTC AGCAT C GT GAT GAC CAC CT GCT CC AGCT T CAGT GC CTT GGGCA 518 

Qy 419 T GAT GC CACT CT GC AT T T AT CT CT ACAC C TGGTCCTGGAGTCTTCAGCAGAATCTCA 475 

I I I I I I I I I I I I I I I M I I I I I I II I I I M II 

Db 519 T GAT GC C ACT C CT CT TAT AC GT CT ACAGC AAAGGCAT CT AC GAT GGAGAC CT T AAGGACA 578 

Qy 47 6 CCATTCCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTG 535 

I I I I I I III II I II I I I I I I I I I I I I I I 

Db 579 AGGT GCCCTACAAAGGCATTATGATAT CACTAGT CATAGTT CTCATT CCTTGCACCAT AG 638 

Qy 536 GT GT CT AT GT GAAT T AC AGAT GGC C AAAACAAT C CAAAAT CATT CT CAAGAT T GGGGC C G 595 

I II I I I I I I I I I I I I I I MINIMI II 

Db 639 GGAT CGT CCT CAAGT CCAAAAGGCCAC ACTAT GT ACCCT ACATCCT CAAGGGAGGCAT GA 698 

Qy 596 TTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGAT 655 

I I I I I I I I I I I Ml III III II II 

Db 699 TCATCACCTTCCTCCTCTCTGTGGCTGTCACAGCCCTCTCTGTCATCAATGTGGGCAACA 758 

Qy 656 CTT GGAAT T CAGACAT CAC CCTTCTGACCATCAGTTTCATCTTTCCTTTGATTG 709 

I I I II I I I I I III I I I III I I I II II 

Db 7 59 GCATCATGTTCGTCATGACACCACACTTACTGGCTACCTCCTCCCTGATGCCCTTCTCTG 818 

Qy 710 GCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGA 769 

I I I I I I I M I I II I I I I II I I I I I I 

Db 819 GCTTTCTGATGGGTTACATTCTCTCTGCTCTCTTCCAACTCAATCCAAGCTGCAGACGCA 87 8 

Qy 77 0 C AAT T T C C T T AG AAAC T G GAG C T C AG AAT AT T C AG AT GT G CAT CAC CAT G C T C CAGT TAT 829 

III I I II I I I I II I I I I I I I I I III M II I I I I I I 

Db 879 C CAT CAGCAT GGAAAC AGGAT T C CAAAACATT CAACT CT GT T CT ACC AT CCT CAAT GT GA 938 

Qy 830 CTTTCACTGCTGAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTCC 88 9 

I II I I II I I III i III M M II III I I I II 

Db 939 CCTTCCCCCCTGAAGTCATTGGGCCACTTTTCTTCTTTCCTCTCCTCTACATGATTTTCC 998 

Qy 8 90 AGCT GAT AGAT GGAT TT CTT AT 911 

II II I I I I II II I I I I 

Db 999 AGC T T GCAGAAGGACT T CT CAT 1020 



RESULT 12 
ADB58234 

ID ADB58234 standard; DNA; 1663 BP. 
XX 

AC ADB58234; 
XX 

DT 04-DEC-2003 (first entry) 
XX 

DE Toxicity-related gene, SEQ ID 3260. 
XX 

KW Toxic; toxin; gene expression profile; hepatotoxicity; liver; 

KW drug screening; toxicity assay; ds . 

XX 

OS Unidentified. 
XX 

PN WO2003064624-A2. 



XX 

PD 07-AUG-2003. 
XX 

PF 31-JAN-2003; 2003WO-US003194 . 
XX 

PR 31-JAN-2002; 2002US-00060087 . 

PR 15-MAR-2002; 2002US-0364045P . 

PR 15-MAR-2002; 2002US-0364055P . 

PR 30-DEC-2002; 2002US-0436643P . 
XX 

PA (GENE-) GENE LOGIC INC. 
XX 

PI Mendrick D, Porter M, Johnson K, Higgs B, Castle A, Elashoff M; 
XX 

DR WPI; 2003-689530/65. 
XX 

PT Predicting a toxic effect of a compound, useful in identifying toxicity 

PT markers in liver tissues or cells for drug screening and toxicity assays, 

PT comprises preparing gene expression profile of tissue or cells exposed to 

PT the compound. 
XX 

PS Claim 1; SEQ ID NO 3260; 1156pp; English. 
XX 

CC The present invention relates to a method for predicting a toxic effect 

CC of a compound. The method comprises preparing a gene expression profile 

CC of a tissue or cell sample exposed to the compound, and comparing the 

CC gene expression profile to a database comprising SEQ ID 1-4925, where 

CC differential expression of the gene indicates at least one toxic effect. 

CC The method is useful for predicting at least one toxic effect of a 

CC compound, predicting hepatotoxicity or the progression of a toxic effect 

CC of a compound, identifying an agent that modulates the onset or 

CC progression of a toxic response, predicting the cellular pathways that a 

CC compound modulates in a cell, and identifying an agent that modulates at 

CC least one activity of a protein. The method and compositions of the 

CC present invention using a database of genes having liver toxin-induced 

CC differential expression, are useful in identifying toxicity markers in 

CC liver tissues or cells for drug screening and toxicity assays. Note: The 

CC sequence data for this patent did not form part of the printed 

CC specification, but was obtained in electronic format directly from WIPO 

CC at ftp.wipo.int/pub/published_pct_sequences. 

XX 

SQ Sequence 1663 BP; 450 A; 460 C; 325 G; 428 T; 0 U; 0 Other; 

Query Match 16.2%; Score 183.2; DB 9; Length 1663; 
Best Local Similarity 53.6%; Pred. No. 1.9e-46; 

Matches 430; Conservative 0; Mismatches 363; Indels 9; Gaps 2; 

Qy H9 TGATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGT 17 8 

I I I I I I III I II I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 219 TAATGTTGCTGCTTATCATGCTCTCACTGGGCTGCACCATGGAATTCAGCAAGATCAAGG 278 

Qy 179 CGCACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGC 238 

I I M I III III II I III III I I I I I I I I I I I I I I I 

Db 27 9 CTCACTTGTGGAAGCCCAAAGGGGTGATCGTTGCCTTGGTGGCCCAGTTTGGCATCATGC 338 

Qy 239 CTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTG7\AGCCAGTCCAAGCTATTGCTG 298 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 



339 CCCTCGCTGCTTTTCTTCTCGGCAAGATCTTTCACCTGAGCAACATTGAAGCTCTGGCCA 398 



Qy 299 TTCTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTG 358 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 

Db 399 TCCTCATCTGTGGCTGCTCTCCCGGGGGGAACTTGTCCAACCTCTTCACCCTGGCCATGA 458 

Qy 359 AT G GAGAT AT GGAT CT CAGCAT CAGT AT GACAACCT GTT C CAC C GT G GCC GC C CT GGGAA 418 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 459 AGGGGGACATGAACCTCAGCATCGTGATGACCACCTGCTCCAGCTTCAGTGCCTTGGGCA 518 

Qy 419 T GAT GC C ACT CT GCAT T TAT CT CT AC AC C T GGT C CT GGAGT CTT CAG CAGAAT CT C A 47 5 

I I I I I I I II I I I I I I I I I I I I I I M I I I I I M 

Db 519 T GAT GC CACT CCT CTT AT AC GT C T AC AGCAAAGG C AT CT ACGAT GGAGAC CTT AAGGACA 57 8 

Qy 47 6 CCATTCCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTG 535 

I I I I I I III I I I II I I I I I I I I II II I I 

Db 57 9 AGGT G C CCT ACAAAG GCAT TAT GAT AT CACT AGT CAT AGT T CT CAT T CCT T GC AC CAT AG 638 

Qy 536 GT GT CTAT GT GAATT ACAGAT GGCCAAAACAAT CCAAAAT CATT CT CAAGATT GGGGCCG 595 

I II I I I I I I I I I I I I I I in linn M 

Db 639 GGATCGTCCTCAAGTCCAAAAGGCCACACTATGTACCCTACATCCTCAAGGGAGGCATGA 698 

Qy 596 TTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGAT 655 

II I I I I I I I i I I I I I I I I I I I I II 

Db 699 TCATCACCTTCCTCCTCTCTGTGGCTGTCACAGCCCTCTCTGTCATCAATGTGGGCAACA 758 

Qy 656 C T T G GAAT T C AGAC AT CAC CCTTCTGACCATCAGTTTCATCTTTCCTTTGATTG 709 

I I I I I I I I I I I I I I I I III I I I I I M 

Db 759 GCATCATGTTCGTCATGACACCACACTTACTGGCTACCTCCTCCCTGATGCCCTTCTCTG 818 

Qy 710 GCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGA 769 

II M Ml I I I I I I 

Db 819 GCTTTCTGATGGGTTACATTCTCTCTGCTCTCTTCCAACTCAATCCAAGCTGCAGACGCA 878 

Qy 77 0 C AAT T T C CT T AGAAACT G GAGCT CAGAAT ATT C AGAT GT GCAT CAC CAT GCT C CAGT TAT 82 9 

III I I I I I I I I I I I I II I I I I I III I I I I I I I I I I 

Db 879 CCAT CAGCAT GGAAACAGGATT CCAAAACATT CAACTCT GTT CTACCAT CCTCAAT GTGA 938 

Qy 830 CTTTCACTGCTGAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTCC 889 

I I I I I I I I I Ml I Ml M II II III I I I I I 

Db 939 CCTTCCCCCCTGAAGTCATTGGGCCACTTTTCTTCTTTCCTCTCCTCTACATGATTTTCC 998 

Qy 8 90 AG CT GAT AGAT GGAT T T CT TAT 911 

M I I I I I I I I I I I I I I 

Db 999 AG CT T GCAGAAGGACT TCT CAT 1020 



RESULT 13 
ABN95678 

ID ABN95678 standard; DNA; 1580 BP. 
XX 

AC ABN95678; 
XX 

DT 13-AUG-2002 (first entry) 
XX 

DE Gene #2176 used to diagnose liver cancer. 
XX 



KW Gene; liver cancer; ds ; hepatocellular carcinoma; hepatotropic; 

KW metastatic liver tumour; cytostatic; expression profile; disease state; 

KW disease progression; drug toxicity; drug efficacy; drug metabolism. 

XX 

OS Homo sapiens. 
XX 

PN WO200229103-A2 . 
XX 

PD ll-APR-2002. 
XX 

PF 02-OCT-2001; 2001WO-US030589 . 
XX 

PR 02-OCT-2000; 2000US-0237054P . 
XX 

PA (GENE-) GENE LOGIC INC. 
XX 

PI Home D, Alvares C, Peres-Da-Silva S, Vockley JG; 
XX 

DR WPI; 2002-426119/45. 
XX 

PT Diagnosing and detecting the progression of liver cancer, hepatocellular 

PT carcinoma or metastatic liver tumor in a patient, involves detecting the 

PT level of expression of two or more genes in a liver tissue sample. 
XX 

PS Claim 1; SEQ ID NO 2176; 298pp; English. 
XX 

CC The invention relates to a novel method for diagnosing and detecting the 



CC progression of liver cancer, hepatocellular carcinoma or metastatic liver 

CC tumour in a patient, and differentiating metastatic liver cancer from 

CC hepatocellular carcinoma in a patient, involving detecting the level of 

CC expression of two or more genes represented in ABN93503-ABN97455 in a 

CC tissue sample. The method of the invention has hepatotropic, and 

CC cytostatic activity. The method is useful for diagnosing and detecting 

CC the progression of liver cancer, hepatocellular carcinoma and metastatic 

CC liver carcinoma in a patient. The method is useful for identifying 

CC expression profiles which serve as useful diagnostic markers as well as 

CC markers that can be used to monitor disease states, disease progression, 

CC drug toxicity, drug efficacy and drug metabolism. Note: The sequence data 

CC for this patent did not form part of the printed specification, but was 

CC obtained in electronic format directly from WIPO at 

CC ftp . wipo . int/pub/published_pct_sequences 



XX 

SQ Sequence 1580 BP; 400 A; 434 C; 341 G; 405 T; 0 U; 0 Other; 

Query Match 15.3%; Score 173.6; DB 6; Length 1580; 

Best Local Similarity 51.9%; Pred. No. 2e-43; 

Matches 445; Conservative 0; Mismatches 404; Indels 9; Gaps 2 

Qy H9 TGATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGT 178 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I 

Db 180 TCATGTTGTTCTTCATCATGCTCTCGCTGGGCTGCACCATGGAGTTCAGCAAGATCAAGG 239 

Qy 179 CGCACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGC 238 

Mill II I II II III II I II I I I I I I II I I I I I I 

Db 2 40 CTCACTTATGGAAGCCTAAAGGGCTGGCCATCGCCCTGGTGGCACAGTATGGCATCATGC 299 

Qy 239 CTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTG 298 



I I 1 1 1 1 I I I 1 1 1 1 1 1 III I II 1 1 1 I III III 

Db 300 CCCTCACGGCCTTTGTGCTGGGCAAGGTCTTCCGGCTGAAGAACATTGAGGCACTGGCCA 359 

Qy 299 TTCTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTG 358 

I I II I I I I I I I II II II I I I II II I I I II I I 

Db 360 TCTTGGTCTGTGGCTGCTCACCTGGAGGGAACCTGTCCAATGTCTTCAGTCTGGCCATGA 419 

Qy 359 AT G GAG AT AT G GAT C T C AG CAT C AGT AT G AC AAC C T GT T C C AC CGTGGCCGCCCTGG G AA 418 

I II II III I MINIM I I I I I Mill MIMI I Mill II I 

Db 420 AGGGGGACATGAACCTCAGCATTGTGATGACCACCTGCTCCACCTTCTGTGCCCTTGGCA 479 

Qy 419 TGATGCCACTCTGCATTTATCTCTACACC T GGT CCT GGAGT CTT CAGCAGAAT CT CA 475 

II I M II I II I I II I I I II II II II I I I I I I M 

Db 480 T GAT GCCT CT CCT CCT GT ACAT CT ACT CCAGGGGGAT CTAT GAT GGGGAC CT GAAGGACA 539 

Qy 476 CCATTCCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTG 535 

I M I I I I I I I I II I I I I I I II I II II II I I 

Db 540 AGGTGCCCTAT7\AAGGCATCGTGATATCACTGGTCCTGGTTCTCATTCCTTGCACCATAG 599 

Qy 536 GT GT CT AT GT GAAT T ACAGAT GGC CAAAACAAT C CAAAAT CAT T CT CAAGATT GGGGC C G 595 

I II I II I I I I M I II I I I I I II I I I M 

Db 600 GGAT C GT CCT C AAAT C CAAACGGC CACAAT AC AT GC G CT AT GT CAT CAAGGGAG GGAT GA 659 

Qy 596 TTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGAT 655 

I I I I I I I I I I I I M I I I II M II 

Db 660 TCATCATTCTCTTGTGCAGTGTGGCCGTCACAGTTCTCTCTGCCATCAATGTGGGGAAGA 719 

Qy 656 CTTGGAATTCAGACAT CAC CCTTCTGACCATCAGTTTCATCTTTCCTTTGATTG 709 

I I I II I II II I MM III I I I II I I I I I 

Db 720 GCATCATGTTTGCCATGACACCACTCTTGATTGCCACCTCCTCCCTGATGCCTTTTATTG 77 9 

Qy 710 GCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGA 769 

I I I I I M II I I I I I II III Ml I I I I 

Db 78 0 GCTTTCTGCTGGGTTATGTTCTCTCTGCTCTCTTCTGCCTCAATGGACGGTGCAGACGCA 839 

Qy 770 CAATT T CCT TAGAAACT GGAGCT CAGAAT ATT C AGAT GT GCAT CAC CAT GCT C C AGT TAT 82 9 

I I I I II I I II M I I II I I II III II I II I I II I I 
Db 840 CT GT CAGCAT GGAGACT G GAT G C CAAAAT GT C CAACT CT GTT C CAC CAT C CT CAAT GT GG 899 

Qy 830 CT T T C ACT GCT GAGC ACT T GGT C C AGAT GTT GAGT T T C C CACT GGC CTAT G GACT CT T C C 88 9 

III I MM III I III I I II II III I MM 

Db 900 CCTTTCCACCTGAAGTCATTGGACCACTTTTCTTCTTTCCCCTCCTCTACATGATTTTCC 959 

Qy 890 AGCT GAT AG AT G GAT T T CTT AT T GT T GCAGC AT AT C AGAC GT ACAAGAG GAGAT T GAAGA 94 9 

I I II I II I I I II I I I I II 

Db 960 AGCTT GGAGAAGGGCT TCT CCT CATT GCCATATTTT GGT GCTATGAGAAATT CAAGACT C 1019 

Qy 950 ACAAACAT GGAAAAAAGA 967 

I M I I I I II : I I 

Db 1020 C C AAG GAT AAAAC AAAAA 1037 



RESULT 14 
AAD56518 

ID AAD56518 standard; DNA; 1580 BP. 
XX 

AC AAD56518; 



XX 

DT 27-AUG-2003 (first entry) 
XX 

DE Human sodium/bile acid cotransporter, 8587 DNA. 
XX 

KW Human; cardiovascular disorder; coronary artery disease; bradycardia; 

KW restenosis; cardiac hypertrophy; ischaemia reperfusion injury; angina; 

KW arteriosclerosis; coronary artery ligation; rheumatic heart disease; 

KW heart failure; hypertension; cardiomyopathy; myocardial infarction; 

KW arterial inflammation; microembolism; atherosclerosis; endocarditis; 

KW vascular heart disease; valvular disease; arrhythmia; gene therapy; 

KW sinus node dysfunction; sodium-bile acid cotransporter ; gene; ds . 
XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualifiers 

FT CDS 83. .1132 

FT /*tag= a 

FT /product= "Human sodium/bile acid cotransporter protein" 
XX 

PN WO2003039341-A2. 
XX 

PD 15-MAY-2003. 
XX 

PF 05-NOV-2002; 2002WO-US035538 . 
XX 

PR 05-NOV-2001; 2 001US-0339582P . 
XX 

PA (MILL-) MILLENNIUM PHARM INC. 
XX 

PI Logan TJ, Chun M, Galvin KM; 
XX 

DR WPI; 2003-441437/41. 

DR P-PSDB; AAE37351. 
XX 

PT Treating a subject having a cardiovascular disorder, e.g. angina, 

PT arrhythmia, or restenosis, comprises administering a 139, 258, 1261, 

PT 1486, 2398, 2414, 7660, 8587, 10183, 10550, 12680, 17921, 32248, 60489 or 

PT 93804 modulator. 

XX 

PS Disclosure; Page 109; 124pp; English. 
XX 

CC The invention relates to methods and compositions for treating a subject 

CC having a cardiovascular disorder using 139, 258, 1261, 1486, 2398, 2414, 

CC 7660, 8587, 10183, 10550, 12680, 17921, 32248, 60489 or 93804 modulator. 

CC The invention is useful for treating a cardiovascular disorder, including 

CC arteriosclerosis, atherosclerosis, vascular wall remodeling, restenosis, 

CC cardiac hypertrophy, ischaemia reperfusion injury, arterial inflammation, 

CC ventricular remodelling, rapid ventricular pacing, tachycardia, coronary 

CC microembolism, bradycardia, pressure overload, aortic bending, coronary 

CC artery ligation, vascular heart disease, valvular disease, including but 

CC not limited to, valvular degeneration caused by calcification, rheumatic 

CC heart disease, endocarditis, or complications of artificial valves; 

CC atrial fibrillation, long-QT syndrome, congestive heart failure, sinus 

CC node dysfunction, angina, heart failure, hypertension, atrial flutter, 

CC atrial fibrillation, pericardial disease, including but not limited to 

CC pericardial effusion and pericarditis, cardiomyopathies (e.g. dilated 



CC cardiomyopathy or idiopathic cardiomyopathy) , myocardial infarction, 

CC coronary artery disease, coronary artery spasm, ischaemic disease, 

CC arrhythmia, sudden cardiac death, and cardiovascular developmental 

CC disorders. The invention is also useful in gene therapy. The present 

CC sequence is human sodium/bile acid cotransporter DNA. This sequence is 

CC used to illustrate the method of the invention 
XX 

SQ Sequence 1580 BP; 400 A; 434 C; 341 G; 405 T; 0 U; 0 Other; 

Query Match 15.3%; Score 173.6; DB 7; Length 1580; 

Best Local Similarity 51.9%; Pred. No. 2e-43; 

Matches 445; Conservative 0; Mismatches 404; Indels 9; Gaps 2; 

Qy 119 TGATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGT 178 

I I M II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

D b 180 TCATGTTGTTCTTCATCATGCTCTCGCTGGGCTGCACCATGGAGTTCAGCAAGATCAAGG 239 

Qy 17 9 CGCACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGC 238 

I III I III II II III II I I I I I I I I I I I I I I I I I 

Db 240 CTCACTTATGGAAGCCTAAAGGGCTGGCCATCGCCCTGGTGGCACAGTATGGCATCATGC 2 99 

Qy 239 CTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTG 2 98 

I I I I I I I I I I II I I I Ml MINI I I I I I II 

Db 300 CCCTCACGGCCTTTGTGCTGGGCAAGGTCTTCCGGCTGAAGAACATTGAGGCACTGGCCA 359 

Qy 2 99 TTCTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTG 358 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 360 TCTTGGTCTGTGGCTGCTCACCTGGAGGGAACCTGTCCAATGTCTTCAGTCTGGCCATGA 419 

Qy 359 ATGGAGATATGGATCTCAGCATCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAA 418 

I II I I III I I I I I I I I I Mill I I I II I I I I I I I I I I I I M I 

Db 42 0 AGGGGGACATGAACCTCAGCATTGTGATGACCACCTGCTCCACCTTCTGTGCCCTTGGCA 479 

Qy 419 T GAT GC C ACT CT G CAT T TAT CT CT AC AC C T GGT C CT G GAGT CTT C AGC AGAAT CT CA 475 

I I I I I I I I i I I I I I I I I I I I I II II I I I I I I II 

Db 480 T GAT GCCTCT CCTCCTGTACATCT ACT CCAGGGGGAT CTAT GAT GGGGACCT GAAGGACA 539 

Qy 476 CCATTCCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTG 535 

I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 

Db 54 0 AG GT GC C CT AT AAAGGC AT C GT GAT AT C ACT GGT C C T GGT T CT C AT T C CT T G CAC CAT AG 599 

Qy 536 GT GT CT AT GT GAAT TAC AGAT GGC CAAAACAAT C C AAAAT CATT CT CAAGAT TGGGGCCG 595 

I II I I I I I I I I I I I I I I I I I I I I I III 

Db 600 GGAT CGT C CT CAAAT CCAAAC GGC CAC AAT AC AT GC GCT AT GT CAT CAAGGGAGGGAT GA 659 

Qy 596 TTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGAT 655 

II I I I I III III Mill I I II II 

Db 660 TCATCATTCTCTTGTGCAGTGTGGCCGTCACAGTTCTCTCTGCCATCAATGTGGGGAAGA 719 

Qy 656 C T T G GAATT C AGAC AT CAC CCTTCTGACCATCAGTTTCATCTTTCCTTTGATTG 709 

I I I I I I I I II I III I III I I I I I I II I I 

Db 720 GC AT CAT GT TT GC CAT GAC ACCACT CT T GAT T GC CAC CT C CT C C CT GAT GC CT T T TAT T G 779 

Qy 710 GCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGA 769 

I I I I I I I I I I I I I I II I M III I I I I 

Db 780 GCTTTCTGCTGGGTTATGTTCTCTCTGCTCTCTTCTGCCTCAATGGACGGTGCAGACGCA 839 



Qy 


770 


CAATTT CCTT AGAAACTGGAGCT CAGAAT ATT CAGAT GT GCAT CACCAT GCT CCAGTT AT 
I | | | I 1 1 1 1 1 II 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
CTGT CAGCAT GGAGACT GGAT GCCAAAAT GT C CAACTCT GTTCCACCAT CCT CAAT GT GG 


829 


Db 


840 


899 


Qy 


830 


CTTTCACTGCTGAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTCC 

Ill III II M II II 

CCTTTCCACCTGAAGTCATTGGACCACTTTTCTTCTTTCCCCTCCTCTACATGATTTTCC 


889 


Db 


900 


959 


Qy 


890 


AGCT GAT AGAT GGAT T T CT T AT T GTT GC AG CAT AT CAGAC GT ACAAGAG GAGAT T GAAGA 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 III 1 II IN M 
AGCT T GGAGAAGGGCTT CT C CT CAT T GC CAT ATT T T GGT G CT AT GAGAAAT T CAAGACT C 


949 


Db 


960 


1019 


Qy 


950 


AC AAACAT GGAAAAAAGA 967 

1 II II 1 1 1 1 1 1 

C CAAGGAT AAAACAAAAA 1037 




Db 


1020 





RESULT 15 
ACC51213 

ID ACC51213 standard; cDNA; 1580 BP. 
XX 

AC ACC51213; 
XX 

DT 16-JUN-2003 (first entry) 
XX 

DE Human Plk-1 related cDNA sequence hmft-1603 SEQ ID NO: 98. 
XX 

KW Human; hepatoblastoma; cancer detection probe; cancer; detection; 

KW hepatocellular carcinoma; hereditary non-polyposis colorectal cancer; 

KW desmoid tumour; anaplastic thyroid carcinoma; Wilm's tumour; tumour; 

KW Plk-1; polo-like kinase-1; gene; ss. 

XX 

OS Homo sapiens . 
XX 

PN WO2003018807-A1. 
XX 

PD 06-MAR-2003. 
XX 

PF 26-AUG-2002; 2002WO-JP008580 . 
XX 

PR 24-AUG-2001; 2001 JP-00255225 . 
XX 

PA (HISM ) HISAMITSU PHARM CO LTD. 

PA (CHIB-) CHIBA PREFECTURE. 

XX 

PI Nakagawara A; 
XX 

DR WPI; 2003-268424/26. 
XX 

PT Nucleic acid sequences differently expressed between hepatoblastoma and 
PT normal liver tissue, are useful for cancer detection and diagnosis. 
XX 

PS Claim 4; Page 156-157; 180pp; Japanese. 
XX 

CC The present invention describes nucleic acid sequences (I) having a 

CC different degree of expression in hepatoblastoma from their expression in 

CC normal liver tissue. ACC51116 to ACC51219 represents specifically claimed 



CC examples of (I). Also described: (1) nucleic acids stringently 

CC hybridising to (I); (2) cancer detection probes containing one or more of 

CC 104 listed sequences (II, see ACC51116 to ACC51219) , including the 79 (I, 

CC see ACC51116 to ACC51194), or partial sequences derived from them; (3) 

CC PCR primers for cancer detection based on sequences (II); (4) marker 

CC proteins for cancer detection, encoded by (II); (5) diagnostic reagents 

CC for cancer diagnosis, containing (II) or their partial sequences. The 

CC nucleic acid sequences are useful in the detection and diagnosis of 

CC cancers including liver, colon, breast, kidney, bladder, ovary and 

CC thyroid cancer, especially for hepatoblastoma, hepatocellular carcinoma, 

CC hereditary non-polyposis colorectal cancer, desmoid tumour, anaplastic 

CC thyroid carcinoma and Wilm' s tumour. They are also used as markers for 

CC predicting the prognosis of these tumours. ACC51220 to ACC51233 represent 

CC PCR primers used in the exemplification of the present invention. The 

CC nucleic acid sequences given in ACC51116 to ACC51219 are related to human 

CC Plk-1 (polo-like kinase-1) , which is located on chromosome 16pl2 

XX 

SQ Sequence 1580 BP; 400 A; 434 C; 341 G; 405 T; 0 U; 0 Other; 



Query Match 15.3%; Score 173.6; DB 7; Length 1580; 

Best Local Similarity 51.9%; Pred. No. 2e-43; 

Matches 445; Conservative 0; Mismatches 404; Indels 9; Gaps 2; 



Qy H9 TGATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGT 178 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 

Db 180 T CAT GTTGTTCTT CAT CAT GCT CT C GCT G GGCT GCACCAT G GAGT T C AGCAAGAT CAAGG 239 

Qy 179 CGCACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGC 2 38 

I I I I I II I II M Ml M I I I I I I II I I I I II I I I 

Db 24 0 CTCACTTATGGAAGCCTAAAGGGCTGGCCATCGCCCTGGTGGCACAGTATGGCATCATGC 2 99 

Qy 239 CTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTG 298 

I I I I I I I I I I I I I I I III M I I I I I III I II 

Db 300 CCCTCACGGCCTTTGTGCTGGGCAAGGTCTTCCGGCTGAAGAACATTGAGGCACTGGCCA 359 

Qy 2 99 TTCTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTG 358 

III I I I I I I I I I I I I I I I I I I I II I I 

Db 360 TCTTGGTCTGTGGCTGCTCACCTGGAGGGAACCTGTCCAATGTCTTCAGTCTGGCCATGA 419 

Qy 359 AT GGAGAT AT GGAT CT CAGCAT CAGTAT GACAACCT GTT CCACCGT GGCCGCCCTGGGAA 418 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 

Db 420 AGGGGGACATGAACCTCAGCATTGTGATGACCACCTGCTCCACCTTCTGTGCCCTTGGCA 479 

Qy 419 T GAT GC C ACT CT G CAT T T AT CT CT ACAC C TGGTCCTGGAGTCTTCAGCAGAATCTCA 475 

I I I I I I I I I I MM I I I I I I I II II I I I I I I II 
Db 4 80 T GAT GC CT CT C CT C CT GT AC AT CT ACT C CAGGGG GAT CT AT GAT G GGGACCT GAAGGACA 539 

Qy 476 CCATTCCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTG 535 

I I I I I I I I I I I II I I I I I I I I I II I I I I I I 

Db 540 AGGTGCCCTATAAAGGCATCGTGATATCACTGGTCCTGGTTCTCATTCCTTGCACCATAG 599 

Qy 536 GT GT CTAT GT GAATT ACAGAT GGCCAAAACAAT C CAAAAT CATT CT CAAGATT GGGGCCG 595 

I II I I I I I I I I I I I I I I I I I I I I I III 

Db 600 GGAT CGT C CT CAAAT C CAAAC GGC CACAAT ACAT G CGCTAT GT CAT CAAGGGAGGGAT GA 659 

Qy 596 TTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGAT 655 

II I I I I I I I I I I I I I I I II II II 



Db 660 TCATCATTCTCTTGTGCAGTGTGGCCGTCACAGTTCTCTCTGCCATCAATGTGGGGAAGA 719 

Qy 656 CTTGGAATT CAGACAT CAC CCTTCTGACCATCAGTTTCATCTTTCCTTTGATTG 7 09 

I I I I I I I I I I I I I I I | I I I I I I I I I I I I 

Db 720 GCATCATGTTTGCCATGACACCACTCTTGATTGCCACCTCCTCCCTGATGCCTTTTATTG 779 

Qy 710 GCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGA 769 

I I I I I I I I I I I I I I II IN Ml I I I I 

Db 780 GCTTTCTGCTGGGTTATGTTCTCTCTGCTCTCTTCTGCCTCAATGGACGGTGCAGACGCA 8 39 

Qy 770 CAATTT C CTT AGAAACT GGAGCT CAGAAT ATT CAGAT GTGCAT CAC CAT GCT CCAGTT AT 82 9 

I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I 
Db 84 0 CT GT C AGC AT G GAG ACTGGAT GCCAAAAT GT C CAACT CT GTT C CAC CAT C CT CAAT GT GG 899 

Qy 830 CTTTCACTGCTGAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTCC 889 

III I I I II III I I M I I I I I I III I I I I I 

Db 900 CCTTTCCACCTGAAGTCATTGGACCACTTTTCTTCTTTCCCCTCCTCTACATGATTTTCC 959 

Qy 890 AGCT GAT AGAT GGATTT CTTATTGTTGCAGCATAT CAGACGT ACAAGAGGAGATT GAAGA 949 

I I I I I I I I I I I I I I I I I I Mil I I I I I II 
Db 960 AGCTTGGAGAAGGGCTTCTCCTCATTGCCATATTTTGGTGCTATGAGAAATTCAAGACTC 1019 

Qy 950 ACAAACATGGAAAAAAGA 967 



102 0 C C AAG GAT AAAAC AAAAA 1037 



Search completed: March 25, 2004, 16:38:33 
Job time : 592 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 



Run on: 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



March 25, 2004, 16:01:20 ; Search time 122 Seconds- 

(without alignments) 
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Searched: 682709 seqs, 277475446 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length:- 2000000000 
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Post -processing : Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : Issued_Patents_NA: * 

1 : /cgn2_6/ptodata/2/ina/5A_COMB. seq: * 
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4 : /cgn2_6/ptodata/2/ina/6B_COMB. seq: * 
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RESULT 1 

US-08-176-126B-1 

; Sequence 1, Application US/08176126B 

; Patent No. 5589358 

; GENERAL INFORMATION: 

APPLICANT: Dawson, Paul A. 

TITLE OF INVENTION: ILEAL BILE ACID TRANSPORTER COMPOSITIONS AND 
TITLE OF INVENTION: METHODS 
NUMBER OF SEQUENCES: 5 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Arnold, White & Durkee 

STREET: P.O. Box 4433 

CITY: Houston 

STATE: Texas 

COUNTRY : US 

ZIP: 77210 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 



COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC -DOS /MS -DOS/ ASCI I 
SOFTWARE: Patent In Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/176 , 126B 
FILING DATE: 29-DEC-1993 
CLASSIFICATION: 435 
ATTORNEY / AGENT INFORMATION: 
NAME: Parker, David L . 
REGISTRATION NUMBER: 32,165 
REFERENCE /DOCKET NUMBER: WAKE: 002 /PAR 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (512) 418-3000 
TELEFAX: (512) 474-7577 
TELEX : na 
INFORMATION FOR SEQ ID NO: 1 : 
SEQUENCE CHARACTERISTICS: 
LENGTH: 2263 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: cDNA 
FEATURE: 

NAME /KEY: CDS 
• LOCATION: 109. .1152 
US-08-176-126B-1 

Query Match 28.3%; Score 320.4; DB 1; Length 2263; 

Best Local Similarity 60.8%; Pred. No. 8.2e-91; 

Matches 522; Conservative 0; Mismatches 336; Indels 0; Gaps 0; 
Qy 80 ATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGT 13 9 

I I II II I 1 1 I I III III I ! II 1 1 I 1 1 II 

Db 188 ACGCCATCCTCAGCGTGGTGATGAGCACCGTGCTCACAATCCTCCTAGCCTTGGTGATGT 247 

Qy 14 0 TCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGG 199 

III llll II Mill III III I I III I III I II MM 

Db 24 8 TTTCCATGGGGTGCAATGTGGAACTCCACAAGTTTCTGGGACACCTAAGGCGGCCATGGG 3 07 

Qy 2 00 GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG 2 59 

Mill Mill I Mill MM MM 1 1 1 1 1 1 M I 1 1 1 1 1 1 Mill 

Db 3 08 GCATCGTCGTGGGCTTCCTCTGTCAGTTTGGAATCATGCCTCTCACAGGTTTCGTCCTGT 3 67 

Qy 2 60 CCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 319 

Ml MM I Mill MUM II II II III III 1 1 Mill 

Db 3 68 CCGTGGCCTTTGGCATCCTCCCAGTGCAAGCTGTGGTGGTGCTGATCCAGGGTTGCTGCC 427 

Qy 320 CGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGATGGAGATATGGATCTCAGCA 3 79 

I M II II Ml II II I III Mill Mill II Mill MUM 

Db 428 CTGGAGGAACTGCCTCCAATATCCTAGCCTATTGGGTAGATGGCGACATGGACCTCAGCG 4 87 

Qy 3 8 0 TCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATC 4 3 9 

I II Mill Mill Mill II Mill I II Ml III II II III I I 

Db 4 88 TTAGCATGACCACCTGCTCCACGCTGCTTGCCCTTGGAATGATGCCCCTTTGCCTCTTCA 547 

Qy 44 0 TCTACACCTGGTCCTGGAGTCTTCAGCAGAATCTCACCATTCCTTATCAGAACATAGGAA 4 99 

II II I I I I llll II I II I II II II I I M I I I I 



Db 



54 8 TCTATACCAAGATGTGGGTTGACTCAGGGACGATTGTGATTCCTTATGACAGCATTGGCA 6 07 



Qy 500 TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 559 

llllll II II II MM MIMI I I llllllll Ml Mill 

Db 608 CTTCTCTGGTTGCTCTTGTTATTCCTGTTTCCATTGGAATGTATGTGAATCACAAATGGC 667 

Qy 560 CAAAACAATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 619 

I II M I II IMM II II Mill II II MM I III II I 

Db 668 CCCAAAAAGCAAAGATCATACTTAAAATTGGATCCATCGCAGGTGCAATTCTCATTGTTC 72 7 

Qy 620 TGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTC 679 

I MM II I III I I Ml II I I MM II MM 

Db 72 8 TCATCGCTGTGGTTGGAGGAATACTGTACCAAAGTGCCTGGACCATTGAACCCAAGCTGT 787 

Qy 680 TGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 73 9 

■ II II I Ml I MM I MM III Ml III I Mill 

Db 78 8 GGATTATAGGAACCATATATCCTATAGCTGGCTACGGCCTGGGGTTTTTCCTGGCTAGAA 84 7 

Qy 74 0 TTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTAGAAACTGGAGCTCAGAATA 7 99 

Ml II I Ml I llllll I III II MM Mill II Mill I 

Db 84 8 TTGCTGGTCAACCCTGGTACAGGTGCCGAACAGTTGCCTTGGAAACCGGGTTGCAGAACA 907 



Qy 800 TTCAGATGTGCATCACCATGCTCCAGTTATCTTTCACTGCTGAGCACTTGGTCCAGATGT 859 

■ ■ MM MM llllll I III I llllll IMM III II III 

Db 908 CTCAGCTGTGTTCCACCATTGTGCAGCTTTCCTTCAGCCCTGAGGACCTCAACCTTGTGT 967 

Qy 8 60 TGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGATGGATTTCTTATTGTTGCAG 919 

II IMM II III I MM Ml I I Ml I I I I II 

Db 968 TCACCTTCCCCCTCATCTACAGCATCTTCCAGATCGCCTTTGCAGCAATACTATTAGGAG 102 7 



Qy 92 0 CATATCAGACGTACAAGA 93 7 

I III I MIIMI 

Db 102 8 CTTATGTCGCATACAAGA 104 5 



RESULT 2 

US-08-669-435-1 

; Sequence 1, Application US/08669435 

; Patent -No. 5869265 

; GENERAL INFORMATION.: 

APPLICANT: Dawson, Paul A. 

TITLE OF INVENTION: ILEAL BILE ACID TRANSPORTER COMPOSITIONS AND 
TITLE OF INVENTION: METHODS 
; NUMBER OF SEQUENCES : 5 

CORRESPONDENCE ADDRESS : 

ADDRESSEE: Arnold, White & Durkee 

STREET: P.O. Box 44 3 3 

CITY: Houston 

STATE : Texas 

COUNTRY: US 

ZIP: 77210 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS -DOS/ASCII 

SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 



APPLICATION NUMBER: US/08/669 , 435 
FILING DATE: 26-JUN-1996 
CLASSIFICATION: 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/176,126 
FILING DATE: 29-DEC-1993 
CLASSIFICATION: 
ATTORNEY /AGENT INFORMATION: 
NAME: Parker, David L. 
REGISTRATION NUMBER: 32,165 
REFERENCE/DOCKET NUMBER: WAKE :'002/PAR 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (512) 418-3000 
TELEFAX: (512) 474-7577 
TELEX : na 
INFORMATION FOR SEQ ID NO : 1: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 2263 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: cDNA 
FEATURE : 

NAME /KEY: CDS 
LOCATION: 109. . 1152 
US-08-669-435-1 

Query Match 2 8.3%; Score 320.4; DB 2; Length 22 63; 

Best Local Similarity ' 60.8%; Pred. No. 8.2e-91; 

Matches 522; Conservative 0; Mismatches 336; Indels 0; Gaps 0; 
Qy 8 0 ATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGT 13 9 

I I I III III II III III I I I ! II I MM 

Db 18 8 ACGCCATCCTCAGCGTGGTGATGAGCACCGTGCTCACAATCCTCCTAGCCTTGGTGATGT 24 7 

Qy 14 0 TCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGG 199 

I II MM II Mill III III I I III I III I II MM 

Db 24 8 TTTCCATGGGGTGCAATGTGGAACTCCACAAGTTTCTGGGACACCTAAGGCGGCCATGGG 3 07 

Qy 2 00 GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG 259 

MM I Mill I Mill IMIIIII IIIIMM I MM II Mill 

Db 3 08 GCATCGTCGTGGGCTTCCTCTGTCAGTTTGGAATCATGCCTCTCACAGGTTTCGTCCTGT 3 67 

Qy 260 CCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 319 

III MM I Mill II MM I I II II Ml Ml Mill 1 1 

Db 3 68 CCGTGGCCTTTGGCATCCTCCCAGTGCAAGCTGTGGTGGTGCTGATCCAGGGTTGCTGCC 427 

Qy 32 0 CGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGATGGAGATATGGATCTCAGCA 379 

I II II II III II II I Ml Mill Mill II Mill MMM 

Db 4 2 8 CTGGAGGAACTGCCTCCAATATCCTAGCCTATTGGGTAGATGGCGACATGGACCTCAGCG 4 87 

Qy 3 80 TCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATC 43 9 

I II Mill Mill Mill II Mill II IM II III I I 

Db 4 88 TTAGCATGACCACCTGCTCCACGCTGCTTGCCCTTGGAATGATGCCCCTTTGCCTCTTCA 547 

Qy 44 0 TCTACACCTGGTCCTGGAGTCTTCAGCAGAATCTCACCATTCCTTATCAGAACATAGGAA 4 99 

MM III I III I II I 11:1 I I III II I 



Db 



54 8 TCTATACCAAGATGTGGGTTGACTCAGGGACGATTGTGATTCCTTATGACAGCATTGGCA 607 



Qy 500 TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 55 9 

I Ml II II llllllll II Ml I lllllllll III 1 1 1 1 1 

Db 608 CTTCTCTGGTTGCTCTTGTTATTCCTGTTTCCATTGGAATGTATGTGAATCACAAATGGC 667 

Qy 560 CAAAACAATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 619 

I I M I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I I 1 1 1 1 lllllll 

Db 6 68 CCCAAAAAGCAAAGATCATACTTAAAATTGGATCCATCGCAGGTGCAATTCTCATTGTTC 727 

Qy 62 0 TGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTC 67 9 

I MM II I MM I Ml llllllll M MM 

Db 72 8 TCATCGCTGTGGTTGGAGGAATACTGTACCAAAGTGCCTGGACCATTGAACCCAAGCTGT 787 

Qy 680 TGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 73 9 

II M I III I MM I MM I I I III Ml I Mill 

Db 788 GGATTATAGGAACCATATATCCTATAGCTGGCTACGGCCTGGGGTTTTTCCTGGCTAGAA 847 

Qy 74 0 TTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTAGAAACTGGAGCTCAGAATA 7 99 

III 1 1 I Ml I MMM I Ml II MM Mill M MMM 

Db 84 8 TTGCTGGTCAACCCTGGTACAGGTGCCGAACAGTTGCCTTGGAAACCGGGTTGCAGAACA 907 

Qy 800 TTCAGATGTGCATCACCATGCTCCAGTTATCTTTCACTGCTGAGCACTTGGTCCAGATGT 859 

MM MM MMM Mill II MM II MMM II III 

Db 908 CTCAGCTGTGTTCCACCATTGTGCAGCTTTCCTTCAGCCCTGAGGACCTCAACCTTGTGT 967 

Qy 860 TGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGATGGATTTCTTATTGTTGCAG 919 

II lllllll Ml I Mill MM II I I I II II 

Db 968 TCACCTTCCCCCTCATCTACAGCATCTTCCAGATCGCCTTTGCAGCAATACTATTAGGAG 102 7 

Qy 92 0 CATATCAGACGTACAAGA 93 7 

MM MMM II 

Db 102 8 CTTATGTCGCATACAAGA 104 5 



RESULT 3 

PCT-US94-14431A-1 

; Sequence 1, Application PC/TUS9414431A 
; GENERAL INFORMATION: 
APPLICANT: 

TITLE OF INVENTION: ILEAL BILE ACID TRANSPORTER COMPOSITIONS 
NUMBER OF SEQUENCES: 11 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Arnold, White & Durkee 

STREET: P. O. Box 4433 

CITY: Houston 

STATE : Texas 

COUNTRY: United States of America 

ZIP: 77210 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS/ASCII 

SOFTWARE: Patent In Release #1.0, Version 

SOFTWARE: #1-25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: PCT/US94 / 1443 1A 



FILING DATE: 29-DEC-1994 
CLASSIFICATION: 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: USSN 08/176,126 
FILING DATE: 29-DEC-1993 
CLASSIFICATION: 
ATTORNEY /AGENT INFORMATION: 
NAME: PARKER, DAVID L. 
REGISTRATION NUMBER: 32,165 
REFERENCE/DOCKET NUMBER: WAKE0 05P- - 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (512) 418-3000 
TELEFAX: (713) 789-2679 

TELEX: 79-0924(1) GENERAL INFORMATION: 
INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 2263 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: cDNA 
FEATURE : 

NAME/ KEY: CDS 
LOCATION: 109. .1152 
PCT-US94-14431A-1 

Query Match 28.3%; Score 320.4; DB 5; Length 2263; 

Best Local Similarity 60.8%; Pred. No. 8.2e-91; 

Matches 522; Conservative 0; Mismatches 336; Indels 0; Gaps 0; 
Qy 8 0 ATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGT 139 

II I III I II II III Ml I I II I! Mill 

Db 188 ACGCCATCCTCAGCGTGGTGATGAGCACCGTGCTCACAATCCTCCTAGCCTTGGTGATGT 247 

Qy 14 0 TCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGG 199 

III MM II Mill III MM I MM III I II MM 

Db . 24 8 TTTCCATGGGGTGCAATGTGGAACTCCACAAGTTTCTGGGACACCTAAGGCGGCCATGGG 3 07 

Qy 2 00 GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG '259 

MM I Mill I Mill Mill! MINIM I Mil II Mill 

Db 3 08 GCATCGTCGTGGGCTTCCTCTGTCAGTTTGGAATCATGCCTCTCACAGGTTTCGTCCTGT 3 67 

Qy 2 60 CCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 319 

III MM I Mill III I II I I II II Ml Ml I M MM 

Db 3 68 CCGTGGCCTTTGGCATCCTCCCAGTGCAAGCTGTGGTGGTGCTGATCCAGGGTTGCTGCC 427 

Qy 32 0 CGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGATGGAGATATGGATCTCAGCA 379 

Ml II II III II II I III Mill MM II III III MM 

Db 428 CTGGAGGAACTGCCTCCAATATCCTAGCCTATTGGGTAGATGGCGACATGGACCTCAGCG 487 

Qy 3 80 TCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATC 43 9 

I II II II I Mill I II I I II I I I I I I I I I I 1 1 1 I I II Ml I 
Db 4 88 TTAGCATGACCACCTGCTCCACGCTGCTTGCCCTTGGAATGATGCCCCTTTGCCTCTTCA 54 7 

Qy 44 0 TCTACACCTGGTCCTGGAGTCTTCAGCAGAATCTCACCATTCCTTATCAGAACATAGGAA 4 99 

Ml III I III I II I I I I I I i I I I I I III II I 

Db 54 8 TCTATACCAAGATGTGGGTTGACTCAGGGACGATTGTGATTCCTTATGACAGCATTGGCA 607 



Qy 5 00 TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 559 

I I I! II II 1 1 1 1 1 i i I II 1 1 1 1 I MINIMI Ml Mill 

Db 60 8 CTTCTCTGGTTGCTCTTGTTATTCCTGTTTCCATTGGAATGTATGTGAATCACAAATGGC 667 

Qy 560 CAAAACAATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 619 

I II I! I II Mill II II Mill II I I MM I III II I 

Db 668 CCCAAAAAGCAAAGATCATACTTAAAATTGGATCCATCGCAGGTGCAATTCTCATTGTTC 727 

Qy 62 0 TGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTC 679 

I MM II I III I MM II I I MM II II II 

Db 72 8 TCATCGCTGTGGTTGGAGGAATACTGTACCAAAGTGCCTGGACCATTGAACCCAAGCTGT 787 



Qy 680 TGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 73 9 

II II I III I MM I MM I I I III III I Mill 

Db 788 GGATTATAGGAACCATATATCCTATAGCTGGCTACGGCCTGGGGTTTTTCCTGGCTAGAA 847 

Qy 74 0 TTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTAGAAACTGGAGCTCAGAATA 7 99 

III 1 1 I Ml I Ml Ml I III II MM Mill II Mill I 

Db 84 8 TTGCTGGTCAACCCTGGTACAGGTGCCGAACAGTTGCCTTGGAAACCGGGTTGCAGAACA 907 

Qy 800 TTCAGATGTGCATCACCATGCTCCAGTTATCTTTCACTGCTGAGCACTTGGTCCAGATGT 85 9 

MM MM M II M I MM II MM MM MM II III 

Db 908 CTCAGCTGTGTTCCACCATTGTGCAGCTTTCCTTCAGCCCTGAGGACCTCAACCTTGTGT 967 

Qy 8 60 TGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGATGGATTTCTTATTGTTGCAG .919 

II II I I I I I I II I II I II I I I' I III I I I II I 

Db 968 TCACCTTCCCCCTCATCTACAGCATCTTCCAGATCGCCTTTGCAGCAATACTATTAGGAG 102 7 

Qy 92 0 CATATCAGACGTACAAGA 937 

I III I III III I 

Db 102 8 CTTATGTCGCATACAAGA 104 5 



RESULT 4 

US-08-176-126B-3 

; Sequence 3, Application US/08176126B 
; Patent No. 5589358 

GENERAL INFORMATION: 

APPLICANT: Dawson, Paul A. 

TITLE OF INVENTION: ILEAL BILE ACID TRANSPORTER COMPOSITIONS AND 
TITLE OF INVENTION: METHODS 
NUMBER OF SEQUENCES : 5 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Arnold, White & Durkee 

STREET: P.O. Box 4433 

CITY: Houston 
; STATE : Texas 

COUNTRY : US 

ZIP: 77210 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS/ASCII 

SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/176 , 126B 



FILING DATE: 29-DEC-1993 
CLASSIFICATION: 43 5 
ATTORNEY/ AGENT INFORMATION: 
NAME: Parker, David L. 
REGISTRATION NUMBER : 32,165 
REFERENCE/DOCKET NUMBER: WAKE:002/PAR 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (512) 418-3000 
TELEFAX: (512) 474-7577 
TELEX : na 
INFORMATION FOR SEQ ID NO : 3: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 1047 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: cDNA 
FEATURE : 

NAME /KEY : CDS 
LOCATION: 1. .1044 
US-08-176-126B-3 

Query Match 26.3%; Score 297.8; DB 1; Length 1047; 

Best Local Similarity 58.5%; Pred. No. 7.2e-84; 

Matches 518; Conservative 0; Mismatches 367; Indels 0; Gaps 0; 
Qy 8 0 ATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGT 13 9 

II . I III Ml II Mil II I II III II Mill 

Db 80 ATAACATCCTAAGTGTGGTCCTAAGTACGGTGCTGACCATCCTGTTGGCCTTGGTGATGT 13 9 

Qy 14 0 TCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGG 199 

INI I I I I I I I I II I I II I I III I I II I I I I I I I I I I i I 

Db 14 0 TCTCCATGGGATGCAACGTGGAAATCAAGAAATTTCTAGGGCACATAAAGCGGCCGTGGG 199 

Qy 2 00 GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG 25 9 

MM Ml II I I )! 1 1 I M 1 1 1 1 1 I i 1 1 E 1 1 Mill I MM 

Db 20 0 GCATTTGTGTTGGCTTCCTCTGTCAGTTTGGAATCATGCCCCTCACAGGATTCATCCTGT 25 9 

Qy 260 CCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 319 

I I MM I II MM II I I II MM II II Ml 

Db ' 260 CGGTGGCCTTTGACATCCTCCCGCTCCAGGCCGTAGTGGTGCTCATTATAGGATGCTGCC 319 

Qy 32 0 CGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGATGGAGATATGGATCTCAGCA 3 79 

I I II II I III M III! Ill Mil Mill II Mill IE Ml 

Db 32 0 CTGGAGGAACTGCCTCCAATATCTTGGCCTATTGGGTCGATGGCGACATGGACCTGAGCG 37 9 

Qy 38 0 TCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATC 43 9 

MM Mill II II MM II MM III 1 1 MM! II MM I 

Db 38 0 TCAGCATGACCACATGCTCCACACTGCTTGCCCTCGGAATGATGCCGCTGTGCCTCCTTA 43 9 

Qy 44 0 TCTACACCTGGTCCTGGAGTCTTCAGCAGAATCTCACCATTCCTTATCAGAACATAGGAA - 4 9 9 

MM Ml III II U Mill III MINIM I 

Db -44 0 TCTATACCAAAATGTGGGTCGACTCTGGGAGCATCGTAATTCCCTATGATAACATAGGTA 4 99 

Qy 500 TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 55 9 

Mill M III! MUM I hill III III II I II 

Db 500 CATCTCTGGTTGCTCTCGTTGTTCCTGTTTCCATTGGAATGTTTGTTAATCACAAATGGC 55 9 



Qy 

Db 
Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



56 0 CAAAACAATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 619 

I II II I II Mill II II MINI II I I III Mill II II 

560 CCCAAAAAGCAAAGATCATACTTAAAATTGGGTCCATCGCGGGCGCCATCCTCATTGTGC 619 
620 TGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTC 67 9 

I I II IN MM III II I I MM I II II 

62 0 TCATAGCTGTGGTTGGAGGAATATTGTACCAAAGCGCCTGGATCATTGCTCCCAAACTGT 67 9 

680 TGACCATCAGTTTCATGTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 73 9 

II II I II III III II II I I III Mill Mill 

68 0 GGATTATAGGAACAATATTTCCTGTGGCGGGTTACTCCCTGGGGTTTCTTCTGGCTAGAA 73 9 

74 0 TTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTAGAAACTGGAGCTCAGAATA 799 

Ml I I III I MUM III III M Mill II Mill I 

74 0 TTGCTGGTCTACCCTGGTACAGGTGCCGAACGGTTGCTTTTGAAACGGGGATGCAGAACA 799 
8 0 0 TTCAGATGTGCATCACCATGCTCCAGTTATCTTTCACTGCTGAGCACTTGGTCCAGATGT 85 9 

III I II MM M I Mil II MUM Mill I I II ' 

8 0 0 CGCAGCTATGTTCCACCATCGTTCAGCTCTCCTTCACTCCTGAGGAGCTCAATGTCGTAT 859 
860 TGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGATGGATTTCTTAT.TGTTGCAG 919 

II 1 1 M I M Mi l l ! M M M I II I I Ml 

86 0 TCACCTTCCCGCTCATCTACAGCATTTTCCAGCTCGCCTTTGCCGCAATATTCTTAGGAT 919 
92 0 CATATCAGACGTACAAGAGGAGATTGAAGAACAAACATGGAAAAA 964 

Ml MUM! I || II M I I III 

92 0 TTTATGTGGCATACAAGAAATGTCATGGAAAAAACAAGGCAGAAA 964 



Paul A. 

ILEAL BILE ACID TRANSPORTER COMPOSITIONS AND 
METHODS 



Durkee 



RESULT 5 
US-08-669-435-3 

Sequence 3, Application US/08669435 
Patent No. 5869265 
GENERAL INFORMATION: 
APPLICANT: Dawson 
TITLE OF INVENTION 
TITLE OF INVENTION 
NUMBER OF SEQUENCES : 5 
CORRESPONDENCE ADDRESS: 

ADDRESSEE : Arnold, White & 
STREET: P.O. Box 4433 
CITY: Houston 
STATE : Texas 
COUNTRY : US 
ZIP: 77210 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS -DOS/ASCI I 
SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/669 , 435 
FILING DATE: 26-JUN-1996 
CLASSIFICATION: 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER : US 08/176,126 



FILING DATE: 2 9 -DEC- 1993 

CLASSIFICATION: 
ATTORNEY / AGENT INFORMATION: 
; NAME : Parker, David L. 

REGISTRATION NUMBER : 32,165 

REFERENCE/DOCKET NUMBER: WAKE:002/PAR 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (512) 418-3000 

TELEFAX: (512) 474-7577 

TELEX : na 
; INFORMATION FOR SEQ ID NO: 3: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 1047 base pairs 
; TYPE: nucleic acid 

STRANDEDNESS : single 

TOPOLOGY: linear 
MOLECULE TYPE: CDNA 
FEATURE : 

NAME/ KEY: CDS 

LOCATION: 1. .1044 
US-08-669-435-3 

Query Match 26.3%; Score 297.8; DB 2; Length 1047; 

Best Local Similarity 58.5%; Pred. No. 7.2e-84; 

Matches 518; Conservative 0; Mismatches 367; Indels 0; Gaps 0; 
Qy 80' ATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGT 139 



Db 



80 ATAACATCCTAAGTGTGGTCCTAAGTACGGTGCTGACCATCCTGTTGGCCTTGGTGATGT 13 




QY 



14 0 TCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGG 199 



Db 



14 0 TCTCCATGGGATGCAACGTGGAAATCAAGAAATTTCTAGGGCACATAAAGCGGCCGTGGG 199 




Qy 



2 00 GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG 25 9 



Db 




Qy 



260 CCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 319 



Db 



260 CGGTGGCCTTTGACATCCTCCCGCTCCAGGCCGTAGTGGTGCTCATTATAGGATGCTGCC 3 




Qy 



32 0 CGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGATGGAGATATGGATCTCAGCA 3 7 9 



Db 




Qy 



38 0 TCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATC 43 9 



Db 




Qy 



44 0 TCTACACCTGGTCCTGGAGTCTTCAGCAGAATCTCACCATTCCTTATCAGAACATAGGAA 4 99 



Db 




Qy 

Db 



500 
500 



TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 

■ I II II II MMMI II MM I I III III III Mill 

CATCTCTGGTTGCTCTCGTTGTTCCTGTTTCCATTGGAATGTTTGTTAATCACAAATGGC 



559 
559 



Qy 560 CAAAACAATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 619 

I II II I II Mill II II 1 1 1 1 1 1 MM II I 1 1 1 i 1 II II 

Db 560 CCCAAAAAGCAAAGATCATACTTAAAATTGGGTCCATCGCGGGCGCCATCCTCATTGTGC 619 

Qy 62 0 TGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTC 679 

I Ml II I MM III Mil MM I II M 

Db 62 0 TCATAGCTGTGGTTGGAGGAATATTGTACCAAAGCGCCTGGATCATTGCTCCCAAACTGT 679 

Qy 68 0 TGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 73 9 

II II I II MINI II II I I III Mill Mill 

Db 680 GGATTATAGGAACAATATTTCCTGTGGCGGGTTACTCCCTGGGGTTTCTTCTGGCTAGAA 73 9 

Qy 74 0 TTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTAGAAACTGGAGCTCAGAATA 7 99 

III I MM I llllll III Ml II Mill II Mill I 

Db 74 0 TTGCTGGTCTACCCTGGTACAGGTGCCGAACGGTTGCTTTTGAAACGGGGATGCAGAACA 799 

Qy 8 0 0 TTCAGATGTGCATCACCATGCTCCAGTTATCTTTCACTGCTGAGCACTTGGTCCAGATGT 85 9 

llllll llllll I III I II llllll Hill I I II 

Db 80 0 CGCAGCTATGTTCCACCATCGTTCAGCTCTCCTTCACTCCTGAGGAGCTCAATGTCGTAT 859 

Qy ..860. -TGAGT.TTCCCACTGGCCTATGGACT.CTT.CCAGCT.GATAGATGGATT.TCTTATTGTTGCAG 919 

I I Mill M Ml I I IIIIMM II I I M I 

Db 860 TCACCTTCCCGCTCATCTACAGCATTTTCCAGCTCGCCTTTGCCGCAATATTCTTAGGAT 919 

Qy 92 0 CATATCAGACGTACAAGAGGAGATTGAAGAACAAACATGGAAAAA 964 

III I I lllllll I II II II I III 

Db 92 0 TTTATGTGGCATACAAGAAATGTCATGGAAAAAACAAGGCAGAAA 964 



RESULT 6 

PCT-US94-14431A-3 

; Sequence 3, Application PC/TUS9414431A 
; GENERAL INFORMATION: • 
APPLICANT: 

TITLE OF INVENTION: ILEAL BILE ACID TRANSPORTER COMPOSITIONS 
NUMBER OF SEQUENCES: 11 
CORRESPONDENCE ADDRESS : 

ADDRESSEE: Arnold, White & Durkee 

STREET: P. O. Box 4433 

CITY: Houston 
; STATE : Texas 

COUNTRY: United States of America 

ZIP: 77210 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS/ASCII 

SOFTWARE: Patentln Release #1.0, Version 

SOFTWARE: #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: PCT/US94/14 4 31A 

FILING DATE: 2 9 -DEC- 1994 

CLASSIFICATION: 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: USSN 08/176,126 

FILING DATE: 29-DEC-1993 



CLASSIFICATION: 
ATTORNEY/ AGENT INFORMATION: 
NAME: PARKER , DAVID L. 
REGISTRATION NUMBER: 32,165 
REFERENCE/DOCKET NUMBER: WAKE0 05P- - 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (512) 418-3000 
TELEFAX: (713) 789-2679 

TELEX: 79-0924(1) GENERAL INFORMATION: 
INFORMATION FOR SEQ ID NO : 3 : 
SEQUENCE CHARACTERISTICS : 
LENGTH: 1047 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: cDNA 
FEATURE : 

NAME /KEY: CDS 
LOCATION: 1 . . 1044 
PCT-US94-14431A-3 

Query Match 26.3%; Score 297.8; DB 5; Length 1047; 

Best Local Similarity 58.5%; Pred. No. 7.2e-84; 

Matches 518; Conservative 0; Mismatches 367; Indels 0; Gaps 0; 
Qy 8 0 ATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGT 13 9 

II I III Mill I II I II I II III II Mill 

Db 8 0 ATAACATCCTAAGTGTGGTCCTAAGTACGGTGCTGACCATCCTGTTGGCCTTGGTGATGT 13 9 

Qy 14 0 TCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGG 199 

1 1 1 1 MM III Ill Ml I llllll I I I II MM 

Db 14 0 TCTCCATGGGATGCAACGTGGAAATCAAGAAATTTCTAGGGCACATAAAGCGGCCGTGGG 199 

Qy 2 00 GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG 25 9 

Mill III II I Mill III Mill Mill 1 1 Mill I Mill 

Db 2 00 GCATTTGTGTTGGCTTCCTCTGTCAGTTTGGAATCATGCCCCTCACAGGATTCATCCTGT 259 

Qy 2 60 CCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 319 

I I Ml! I II MM llllll Mill II II IMMM 

Db 2 60 CGGTGGCCTTTGACATCCTCCCGCTCCAGGCCGTAGTGGTGCTCATTATAGGATGCTGCC 319 

Qy 32 0 CGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGATGGAGATATGGATCTCAGCA 37 9 

I II 1 1 1 1 1 1 1 1 1 1 1 1 1 III 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 32 0 CTGGAGGAACTGCCTCCAATATCTTGGCCTATTGGGTCGATGGCGACATGGACCTGAGCG 379 

Qy 3 8 0 TCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATC 43 9 

MM Mill M M II Ml II Mill II II IMMM M Ml I I 

Db 3 80 TCAGCATGACCACATGCTCCACACTGCTTGCCCTCGGAATGATGCCGCTGTGCCTCCTTA 43 9 

Qy 44 0 TCTACACCTGGTCCTGGAGTCTTCAGCAGAATCTCACCATTCCTTATCAGAACATAGGAA 499 

IMMM Ml M II III I MM MM 1 1 MM 

Db 44 0 TCTATACCAAAATGTGGGTCGACTCTGGGAGCATCGTAATTCCCTATGATAACATAGGTA 4 99 

Qy 50 0 TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 55 9 

Mill M IMMM II MM I MM Ml III Mill 

Db 500 CATCTCTGGTTGCTCTCGTTGTTCCTGTTTCCATTGGAATGTTTGTTAATCACAAATGGC 559 



Qy 560 CAAAACAATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 619 

i ii m i iMiiii ii ii mill it ii ii i iiiii ii ii 

Db 560 CCCAAAAAGCAAAGATCATACTTAAAATTGGGTCCATCGCGGGCGCCATCCTCATTGTGC 619 

Qy 62 0 TGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTC 67 9 

I I II II I III I III llllllll I II II 

Db 620 TCATAGCTGTGGTTGGAGGAATATTGTACCAAAGCGCCTGGATCATTGCTCCCAAACTGT 67 9 

Q y 680 TGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 73 9 

II II I I! Ill II I II III I III IIIII Mill 

D b 6 80 GGATTATAGGAACAATATTTCCTGTGGCGGGTTACTCCCTGGGGTTTCTTCTGGCTAGAA 73 9 

Q y 74 0 TTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTAGAAACTGGAGCTCAGAATA 799 

III I I Ml I Mill! I II III II Mill II IIIII I 

Db 74 0 TTGCTGGTCTACCCTGGTACAGGTGCCGAACGGTTGCTTTTGAAACGGGGATGCAGAACA 799 

Qy 80 0 TTCAGATGTGCATCACCATGCTCCAGTTATCTTTCACTGCTGAGCACTTGGTCCAGATGT 859 

III I II llllll I Ml I II MUM IIIII I I II 

Db 8 00 CGCAGCTATGTTCCACCATCGTTCAGCTCTCCTTCACTCCTGAGGAGCTCAATGTCGTAT 859 

Qy 8 60 TGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGATGGATTTCTTATTGTTGCAG 919 

I I Mill II III I I II MUM II IIIII 

Db 860 TCACCTTCCCGCTCATCTACAGCATTTTCCAGCTCGCCTTTGCCGCAATATTCTTAGGAT 919 

Qy 92 0 CATATCAGACGTACAAGAGGAGATTGAAGAACAAACATGGAAAAA 964 

III I I IMIIII I II II I I I III 

Db 92 0 TTTATGTGGCATACAAGAAATGTCATGGAAAAAACAAGGCAGAAA 964 



RESULT 7 

US-09-83 3-3 81-317/C 

Sequence 317, Application US/09833381 
Patent No. 6672186 
GENERAL INFORMATION: 
APPLICANT: Robison, Keith E. 

TITLE OF INVENTION: No. 6672186el Nucleic Acid and Protein Homologs 
FILE REFERENCE: 5800-119 

CURRENT APPLICATION NUMBER : US/09/833,381 
CURRENT FILING DATE: 2001-04-11 
PRIOR APPLICATION NUMBER : 09/516,448 
PRIOR FILING DATE: 2000-02-29 
NUMBER OF SEQ ID NOS : 2050 

SOFTWARE: FastSEQ for Windows Version 3.0 
SEQ ID NO 317 
LENGTH: 310 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 

NAME / KEY : misc_f eature 
LOCATION: (1) . . . (310) 
OTHER INFORMATION: n = A,T,C or G 
US-09-833-381-317 



Query Match 6.9%; 
Best Local Similarity 60.2%; 
Matches 127; Conservative 



Score 77.8; DB 4; Length 310; 
Pred. No. 1.3e-14; 
0; Mismatches 84; Indels 0; Gaps 



0; 



Qy 


748 


Db 


221 


Qy 


808 


Db 


161 


Qy 


868 


Db 


101 


Qy 


928 


Db 


41 



CAGTCTTGGCAAAGGTGCAGGACAATTTCCTTAGAAACTGGAGCTCAGAATATTCAGATG 8 07 

II I Ml I MINI I III I 1 1 1 1 MINIMI Mill I MM II 



TGCATCACCATGCTCCAGTTATCTTTCACTGCTGAGCACTTGGTCCAGATGTTGAGTTTC 8 67 

III MUM I III I II Ml I MM! I III MM I III 

TOCTCCACCATTGTAC AGCTCTCCTTCTCCCCCGAGGATCTCAACCTGGTGTTCACCTTC 102 



II 1 1 1 1 I llllllll I III Mill III 



ACGTACAAGAGGAGATTGAAGAACAAACATG 958 

M MM II II MM Ml 



RESULT 8 

US-09-252-991A-7387 

Sequence 7387, Application US/09252991A 
Patent No. 6551795 
GENERAL INFORMATION: 
APPLICANT: Marc J. Rubenfield et al. 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 107196.136 

CURRENT APPLICATION NUMBER: US/09/252 , 991A 
CURRENT FILING DATE: 1999-02-18 
PRIOR APPLICATION NUMBER: US 60/074,788 
PRIOR FILING DATE: 1998-02-18 
PRIOR APPLICATION NUMBER : US 60/094,190 
PRIOR FILING DATE: 1998-07-27 
NUMBER OF SEQ ID NOS : 3 3142 
SEQ ID NO 7387 
LENGTH: 92 7 
TYPE: DNA 

ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-7387 

Query Match 5.6%; Score 64; DB 4; Length 927; 

Best Local Similarity 48.9%; Pred. No. 6e-10; 

Matches 172; Conservative 0; Mismatches 180; Indels 0; Gaps 0; 
Qy 91 GAGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGTTCTCTTTGGGA 150 

1 1 I I I I 1 1 I II III 1 1 1 1 I 1 1 1 1 II I I I 1 1 1 

Db 4 3 GATCCCATCCTGACCCTGTTCCTCCCCATCGCACTGGGCATCATCATGCTCGGTCTCGGA 102 

Qy 151 TGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGGGCATTGCTGTG 210 

III II II II III I II 

Db 103 CTGTCCCTGACCCCGGCCGACTTCCTCCGCGTGGTGCGCTACCCGAAGCCGGTGCTGGTC 162 

Qy 211 GGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGGCCATTAGCTTT 270 

imii i mill i i ii 1 1 ii i i ii ii ii i 1 1 1 1 

Db 163 GGCCTGGTGTGCCAGATCGTCCTGCTGCCCCTGGCCTGTTTCCTGATCGTCCAGGGCTTC 222 



QY 



271 TCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCCCGGGGGGCACC 3 3 0 



I III II I I II II II I I I Ml II II II llllll 

Db 223 GCCCTGGAGGCGGCCCTGGCGGTCGGCATGATGTTGCTGGCGGCCTCGCCCGGCGGCACC 2 82 

Qy 331 ATCTCTAACATTTTCACCTTCTGGGTTGATGGAGATATGGATCTCAGCATCAGTATGACA 3 90 

II I 1 1 1 II I M I MINIM III II I 1 1 1 II MM 

Db 2 83 ACCGCCAACCTCTACAGCCACCTGGCGCATGGCGACGTGGCACTGAACATCACCCTGACC 342 

Qy 3 91 ACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATCTCT 44 2 

I I I III MM I I Mill II I I Mill 

Db 34 3 GCGGTGAACTCGGTGATCGCCATCCTCACCATGCCGCTGATCGTCAATCTGT 3 94 



RESULT 9 

US-09-252-991A-7319 

Sequence 7319, Application US/09252991A 
Patent Mo. 6551795 
GENERAL INFORMATION: 
APPLICANT: Marc J. Rubenfield et al . 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 107196.136 

CURRENT APPLICATION NUMBER: US/09/252 , 991A 
CURRENT FILING DATE: 1999-02-18 
PRIOR APPLICATION NUMBER: US 60/074,788 
PRIOR FILING DATE: 1998-02-18 
PRIOR APPLICATION NUMBER: US 60/094,190 
PRIOR FILING DATE: 1998-07-27 
NUMBER OF SEQ ID NOS : 33142 
SEQ ID NO 7319 
. LENGTH: 97 8 
TYPE: DNA 

ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-7319 

Query Match 5.6%; Score 64; DB 4; Length 978; 

Best Local Similarity 48.9%; Pred. No. 6.2e-10; 

Matches 172; Conservative 0; Mismatches 180; Indels 0; Gaps 0; 

Qy 91 GAGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGTTCTCTTTGGGA 150 

Mill III II I II I I II I I I M II II I I I I I 

Db 6 8 GATCCCATCCTGACCCTGTTCCTCCCCATCGCACTGGGCATCATCATGCTCGGTCTCGGA 127 

Qy 151 TGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGGGCATTGCTGTG 210 

Mill II I I II I I II 

Db 12 8 CTGTCCCTGACCCCGGCCGACTTCCTCCGCGTGGTGCGCTACCCGAAGCCGGTGCTGGTC 187 

Qy 211 GGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGGCCATTAGCTTT 270 

II III MM III II II MM I I II II III MM 

Db 18 8 GGCCTGGTGTGCCAGATCGTCCTGCTGCCCCTGGCCTGTTTCCTGATCGTCCAGGGCTTC 24 7 

Qy 271 TCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCCCGGGGGGCACC 33 0 

I I II 1 1 I I 1 1 II I I llllll II I I M I II I II 

Db 24 8 GCCCTGGAGGCGGCCCTGGCGGTCGGCATGATGTTGCTGGCGGCCTCGCCCGGCGGCACC 3 07 

Qy 331 ATCTCTAACATTTTCACCTTCTGGGTTGATGGAGATATGGATCTCAGCATCAGTATGACA 3 90 

I I I III II II I I II MINI III II I Mill 1 1 1 1 



Db 3 08 ACCGCCAACCTCTACAGCCACCTGGCGCATGGCGACGTGGCACTGAACATCACCCTGACC 3 67 

Qy 3 91 ACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATCTCT 44 2 

I II III INN I Mill II II Mill 

Db 3 68 GCGGTGAACTCGGTGATCGCCATCCTCACCATGCCGCTGATCGTCAATCTGT 419 



RESULT 10 

US -09-252-99 1 A - 7 3 0 0 / c 

; Sequence 7300, Application US/092B2991A 
; Patent No. 6551795 
; GENERAL INFORMATION : 

APPLICANT: Marc J . Rubenfield et al . 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 107196.136 

CURRENT APPLICATION NUMBER: US/ 09/252 , 991A 
; CURRENT FILING DATE: 1999-02-18 

PRIOR APPLICATION NUMBER: US 60/074,788 

PRIOR FILING DATE: 1998-02-18 
; PRIOR APPLICATION NUMBER : US 60/094,190 

PRIOR FILING DATE: 1998-07-27 
; NUMBER OF SEQ ID NOS : 33142 
;. SEQ ID NO 73 00 
LENGTH: 1008 
TYPE: DNA 

ORGANISM : Pseudomonas aeruginosa 
US-09-252-991A-7300 



Query Match 5.6%; Score 64; DB 4; Length 10 08; 

Best Local Similarity 48.9%; Pred. No. 6.3e-10; 

Matches 172; Conservative 0; Mismatches 180; Indels 0; Gaps 0; 



Qy 


91 


GAGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGTTCTCTTTGGGA 


150 


Db 


891 


Mill III Mi ll 1 MM 1 Mill II 1 1 III 

GATCCCATCCTGACCCTGTTCCTCCCCATCGCACTGGGCATCATCATGCTCGGTCTCGGA 


832 


Qy 


151 


TGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGGGCATTGCTGTG 


210 


Db 


831 


III II II 1 1 II 1 1 II 

CTGTCCCTGACCCCGGCCGACTTCCTCCGCGTGGTGCGCTACCCGAAGCCGGTGCTGGTC 


772 


Qy 


211 


GGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGGCCATTAGCTTT 


270 


Db 


771 


II III 1 llllll 1 1 1 1 MM II II II 1 1 1 MM 

GGCCTGGTGTGCCAGATCGTCCTGCTGCCCCTGGCCTGTTTCCTGATCGTCCAGGGCTTC 


712 


Qy 


271 


TCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCCCGGGGGGCACC 


330 


Db 


711 


MM M 1 Ml II II 1 1 1 III M II II llllll 

GCCCTGGAGGCGGCCCTGGCGGTCGGCATGATGTTGCTGGCGGCCTCGCCCGGCGGCACC 


652 


Qy 


331 


ATCTCTAACATTTTCACCTTCTGGGTTGATGGAGATATGGATCTCAGCATCAGTATGACA 

1 1 1 Ml 1 1 III 1 M III III Ml II 1 Mill MM 

ACCGCCAACCTCTACAGCCACCTGGCGCATGGCGACGTGGCACTGAACATCACCCTGACC 


390 


Db 


651 


592 


Qy 


391 


ACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATCTCT 442 




Db 


591 


1 Mill MM 1 1 Mill II 1 1 Ml 1 

GCGGTGAACTCGGTGATCGCCATCCTCACCATGCCGCTGATCGTCAATCTGT 54 0 





RESULT 11 
US-09-833-381-318 

Sequence 318, Application US/09833381 
Patent No. 6672186 
GENERAL INFORMATION: 
APPLICANT: Robison, Keith E . 

TITLE OF INVENTION: No. 6672186el Nucleic Acid and Protein Homologs 
FILE REFERENCE: 5800-119 

CURRENT APPLICATION NUMBER : US/09/833,381 
CURRENT FILING DATE: 2 001-04-11 
PRIOR APPLICATION NUMBER: 09/516,448 
PRIOR FILING DATE: 2000-02-29 
NUMBER OF SEQ ID NOS : 2050 
SOFTWARE: FastSEQ for Windows Version 3.0 
SEQ ID NO 318 
LENGTH: 3 74 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 

NAME / KEY : mi sc_f eature 
LOCATION: (1) . . . (374) 
OTHER INFORMATION: n = A,T,C or G 
US-09-833-381-318 

Query Match 5.2%; Score 5 8.6; DB 4; Length 3 74 ; 

Best Local Similarity 57.8%; Pred. No. 1.7e-08; 

Matches 141; Conservative 0; Mismatches 100; Indels 3; Gaps 2; 
Qy 8 0 ATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGT 13 9 

llllllll II I I III III I I I I II I MM 

Db 131 ATGCAATTCTCAATACAGTGATGAGCACTGTGCTCACCATCCTCTTAGCCATGGTGATGT 190 

Qy 14 0 TCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGG 199 

I Ml MM II Mill III III I II I! I III I II llll 

Db 191 TTTCTATGGGGTGCAATGTGGAAGTCCACAAGTTCCTAGGACATATAAAGAGACCATGGG 250 

Qy 2 00 GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTT- -TACAGCTTATCTCCT 257 

I II Mill I Mill llllllll llllllll I II I I I 

Db 2 51 GTATCTTCGTGGGCTTCCTCTGTCAGTTTGGAATCATGCCTCTCCACAAGGCTTTTATCC 310 

Qy 2 58 GGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCT - GTTCTCATCATGGGCTGCT 316 

I I I II I I I II II II III I I I I! II II Mill llll 

Db 311 TGTCTGTGGCCTCTGNATCCTTCCTGTACAGGCTGTAGTTGGTGCTAATTATGGGTTGCT 370 



Qy 317 GCCC 320 

llll 

Db 371 GCCC 374 



RESULT 12 
US-09-540-236-963 

; Sequence 963, Application US/09540236 
; Patent No. 6673910 
; GENERAL INFORMATION: 

APPLICANT: Gary L. Breton et al . 



; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
MORAXELLA CATARRHAL IS 

TITLE OF INVENTION: FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 2709.2 005-001 
CURRENT APPLICATION NUMBER: US/09/540,2 36 
CURRENT FILING DATE: 2 000-04-04 
NUMBER OF SEQ ID NOS : 3840 
SEQ ID NO 96 3 
LENGTH: 972 
TYPE: DNA 

ORGANISM: M . catarrhalis 
US-09-540-236-963 

Query Match 5.1%; Score 58.2; DB 4; Length 972; 

Best Local Similarity 48.4%; Pred. No. 4.2e-08; 

Matches 162; Conservative 0; Mismatches 173; Indels 0; Gaps 0; 
Qy 122 TGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGC 181 

II Ml I Mill I 1 1 1 1 I II II II III 

Db 152 TGCTTGGCATCGTCATGCTTGGCATGGGTTTGACCTTGACTTTCAAAGATTTTGGTGAAG 211 

Qy .182 ACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTT 241 

II II I III II I III II III II I II I Ml! 

Db 212 TCACCAAAAACCCCAAGGCGGTGATTGTTGGCGTTATCCTTCAATATGTTGTGATGCCAG 271 

Qy 2 42 TTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTC 3 01 

I I II I II III 1 1 1 II III Mill I III 

Db 2 72 TCATTGCCTTTTTGTTGGTTCAAGCATTTAGGCTACCACCTGATTTGGCTATCGGTGTCA 331 

Qy 3 02 TCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGATG 3 61 

III I MMMMIM II II MM Mil l MM II I II 

Db 3 32 TCTTAGTCGGCTGCTGCCCTGGCGGCACCTCGTCAAATGTCATCACTTTTCTTGCCAAAG 3 91 

Qy 3 62 GAGATATGGATCTCAGCATCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATGA 421 

I Ml III I I II II Mill I MM III 

Db 3 92 GCAATACCGCTTTATCAGTTGCTTGCACGACACTCTCCACACTCTTAGCCCCTATTTTGA 4 51 

Qy 42 2 TGCCACTCTGCATTTATCTCTACACCTGGTCCTGG 4 56 

III Mill I I III III 

Db 452 CACCAGCTGTATTTTATTTATTTGCCAGCCAATGG 4 86 



RESULT 13 

US-09-596-002-41/C 

; Sequence 41, Application US/09596002 
; Patent- No. 6632636 
; GENERAL INFORMATION: 

APPLICANT: Lagace , Robert, E. 

APPLICANT: Patterson, Chandra 
; APPLICANT: Berg, Kim, L. 

; TITLE OF INVENTION: NUCLEOTIDE SEQUENCES OF MORAXELLA CATARRHALIS GENOME 

; FILE REFERENCE: PM-0008-4 US 

; CURRENT APPLICATION NUMBER: US/09/596,002 

; CURRENT FILING DATE: 2000-06-16 

; PRIOR APPLICATION NUMBER: 60/140,121 

; PRIOR FILING DATE: 1999-06-18 

; NUMBER OF SEQ ID NOS: 41 



SOFTWARE: PERL Program 

; SEQ ID NO 41 

LENGTH: 269223 
TYPE: DNA 

ORGANISM: Moraxella catarrhalis 
FEATURE : 

NAME /KEY : misc_feature 

OTHER INFORMATION: Incyte template ID No. 663263 6 41 
; PUBLICATION INFORMATION: 
US-09-596-002-41 

Query Match 5.1%; Score 58.2; DB 4; Length 269223; 

Best Local Similarity 48.4%; Pred. No. 1.4e-06; 

Matches 162; Conservative 0; Mismatches 173; Indels 0; Gaps 0; 

Qy 122 TGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGC 181 

II I II I Mill I III! I I! II II III 

Db 18323 8 TGCTTGGCATCGTCATGCTTGGCATGGGTTTAACCTTGACTTTCAAAGATTTTGGTGAAG 

183179 



Qy 182 ACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTT 241 

II II I Ml II I I Ml I II Mill I Mill 

Db 183178 TCACCAAAAACCCCAAGGCGGTGATTATTGGCGTTATCCTTCAATATGTTGTGATGCCAG 

183119 



Qy 242 TTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTC 301 

II II I I I III III II M I Mill I III 

Db 183118 TCATTGCCTTTTTGTTGGTTCAAGCATTTAGGCTACCACCTGATTTGGCTATCGGTGTCA 

183059 



Qy 3 02 TCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGATG 3 61 

II I I IIIIIIIIIII II MUM II M I MM M III 

Db 183 058 TCTTAGTCGGCTGCTGCCCTGGCGGCACCTCGTCAAATGTCATCACTTTTCTTGCCAAAG 

182999 

Qy 3 62 GAGATATGGATCTCAGCATCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATGA 421 

I III III I I MM Mill I MM Ml 

Db 182 998 GCAATACCGCTTTATCAGTTGCTTGCACGACACTCTCCACACTCTTAGCCCCTATTTTGA 

182939 



Qy 422 TGCCACTCTGCATTTATCTCTACACCTGGTCCTGG 4 56 

MM Mill II Ml III 

Db 182 93 8 CGCCAGCTGTATTTTATTTATTTGCCAGCCAATGG 182 904 



RESULT 14 

US- 09-2 52 -991A- 13 76/c 

; Sequence 1376, Application US/09252991A 
; Patent No. 6551795 
; GENERAL INFORMATION: 

; APPLICANT: Marc J. Rubenfield et al . 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 107196.136 

CURRENT APPLICATION NUMBER: US/ 09/2 52 , 991A 
; CURRENT FILING DATE: 1999-02-18 



PRIOR APPLICATION NUMBER: US 60/074,788 
; PRIOR FILING DATE: 1998-02-18 
; PRIOR APPLICATION NUMBER : US 60/094,190 
; PRIOR FILING DATE: 1998-07-27 
; NUMBER OF SEQ ID NOS : 3 314 2 
; SEQ ID NO 1376 

LENGTH: 8 91 

TYPE: DNA 
; ORGANISM : Pseudomonas aeruginosa 
US-09-2 52-991A-13 76 

Query Match 4.9%; Score 55.8; DB 4; Length 891; 

Best Local Similarity 47.1%; Pred. No. 2.3e-07; 

Matches 171; Conservative 0; Mismatches 192; Indels 0; Gaps 0 
Qy 94 CTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGTTCTCTTTGGGATGT 153 

Ml MM I I II II I II Ml I MUM MM 

Db 62 0 CTCCCGCTCACCGCAGCCATCGCGCCACTGCTCGGCCTGGTGATGTTCGGCATGGGCCTG 561 

Qy 154 TCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGGGCATTGCTGTGGGA 213 

I I I I I I'll I I I I I III I I I I 

^Db .5 60. ACGCTCAAGGGCGAAGACTTCCGCGAGGTCGCCCGGCACCCCATACGGGTGCTGATCGGC 501 



Qy 214 CTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCT 2 73 

MM MUM I MMM I Mill! M II 

Db 5 00 GTGCTGGCCCAGTTCGTCATCATGCCCGGCCTGGCCTGGTTGCTCTGCAGCCTGTTGCAG 441 

Qy 2 74 CTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCCCGGGGGGCACCATC 3 33 

II III II II M II I I 1 1 MMM III II MMM 

Db 44 0 TTGCCGGCGGAGATCGCGGTGGGCGTGATCCTGGTCGGCTGCTGCCCCGGCGGCACCGCT 381 

Qy 3 34 TCTAACATTTTCACCTTCTGGGTTGATGGAGATATGGATCTCAGCATCAGTATGACAACC 3 93 

1 1 III ! I MM I III Ml I I II I II II I 

Db 3 80 TCCAACGTGATGACCTGGCTGTCCCGTGGCGATGTCGCCCTGTCGGTGGCGATCACCTCG 321 

Qy 3 94 TGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATCTCTACACCTGGTCC 4 53 

MMI M MIM I I I Ml I I II II I II 

Db 32 0 GTGACCACCCTGCTCGCCCCGCTGGTCACGCCGGCGCTGGTCTGGCTGCTGGCTTCGGCC 2 61 

Qy 4 54 TGG 456 

III 

Db 260 TGG 258 



RESULT 15 

US - 0 9-2 52 -991A- 1144 

; Sequence 1144, Application US/09252991A 
; Patent No. 6551795 
; GENERAL INFORMATION: 

; APPLICANT: Marc J. Rubenfield et al . 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 107196.136 

; CURRENT APPLICATION NUMBER: US/09/252 , 991A 

; CURRENT FILING DATE: 1999-02-18 

; PRIOR APPLICATION NUMBER: US 60/074,788 



PRIOR FILING DATE: 1998-02-18 
PRIOR APPLICATION NUMBER: US 60/094,190 
PRIOR FILING DATE: 1998-07-27 
; NUMBER OF SEQ ID NOS : 33142 
; SEQ ID NO 1144 
LENGTH: 94 8 
TYPE: DNA 
; ORGANISM: Pseudomonas aeruginosa 
US- 09-252 -991A- 1144 



Query Match 4.9%; Score 55.8; DB 4; Length 94 8; 

Best Local Similarity 47.1%; Pred. No. 2.4e-07; 

Matches 171; Conservative 0; Mismatches 192; Indels 0; Gaps 0; 
Qy 94 CTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGTTCTCTTTGGGATGT 153 

Ml III! I I II II I II Ml Mill II MM 

Db 112 CTCCCGCTCACCGCAGCCATCGCGCCACTGCTCGGCCTGGTGATGTTCGGCATGGGCCTG 171 

Qy 154 TCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGGGCATTGCTGTGGGA 213 

I I M I III I I I II III II Ml 

Db 172 ACGCTCAAGGGCGAAGACTTCCGCGAGGTCGCCCGGCACCCCATACGGGTGCTGATCGGC 231 

Qy 214 CTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCT 2 73 

MM II MM I MM 1 1! '■ II I I M II II 

Db 2 32 GTGCTGGCCCAGTTCGTCATCATGCCCGGCCTGGCCTGGTTGCTCTGCAGCCTGTTGCAG 2 91 

Qy 2 74 CTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCCCGGGGGGCACCATC 3 33 

Mill M II M III I Ml 1 1 III II I II 1 1 III I 

Db 2 92 TTGCCGGCGGAGATCGCGGTGGGCGTGATCCTGGTCGGCTGCTGCCCCGGCGGCACCGCT 3 51 

Qy 3 34 TCTAACATTTTCACCTTCTGGGTTGATGGAGATATGGATCTCAGCATCAGTATGACAACC 393 

II III I I MM I Ml III II II I II II I 

Db 3 52 TCCAACGTGATGACCTGGCTGTCCCGTGGCGATGTCGCCCTGTCGGTGGCGATCACCTCG 411 

Qy 3 94 TGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATCTCTACACCTGGTCC 4 53 

Mill II Mill I Mill I II Mill 

Db 412 GTGACCACCCTGCTCGCCCCGCTGGTCACGCCGGCGCTGGTCTGGCTGCTGGCTTCGGCC 4 71 

Qy 454 TGG 456 

III 

Db 472 TGG 474 



Search completed: March 25, 2004, 18:55:43 
Job time : 131 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 
Run on: March 25, 2004, 16:28:48 



Search time 491 Seconds 

(without alignments) 

8598.560 Million cell updates/sec 



Title: 

Perfect score: 
Sequence : 



US-10-091-628-1 
1134 

1 atgagagccaattgttccag . 



. acatcacttcatgtgaatag 1134 



Scoring table: IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 

Searched: 2458946 seqs, 1861504846 residues 

Total number of hits satisfying chosen parameters: 



4917892 



Minimum DB seq length: 
Maximum DB seq length: 



0 

2000000000 



Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



Database 



Published_Applications_NA: * 

1: /cgn2_6/ptodata/2/pubpna/US07_PUBCOMB.seq:* 

2 : /cgn2_6/ptodata/2/pubpna/PCT_NEW_PUB. seq: * 

3: /cgn2_6/ptodata/2/pubpna/US06_NEW_PUB.seq:* 

4 : /cgn2_6/ptodata/2/pubpna/US06_PUBCOMB . seq: * 

5: /cgn2_6/ptodata/2/pubpna/US07_NEW_PUB.seq: * 

6: /cgn2_6/ptodata/2/pubpna/PCTUS_PUBCOMB.seq:* 

7: /cgn2_6/ptodata/2/pubpna/US08_NEW_PUB.seq: * 

8 : /cgn2_6/ptodata/2/pubpna/US08_PUBCOMB.seq: * 

9: /cgn2_6/ptodata/2/pubpna/US09A_PUBCOMB. seq: * 

10: /cgn2_6/ptodata/2/pubpna/US09B_PUBCOMB.seq: * 

11: /cgn2_6/ptodata/2/pubpna/US09C_PUBCOMB.seq: * 

12 : /cgn2_6/ptodata/2/pubpna/US09_NEW_PUB. seq: * 

13: /cgn2_6/ptodata/2/pubpna/US10A_PUBCOMB.seq: * 

14 : /cgn2_6/ptodata/2/pubpna/US10B_PUBCOMB.seq: * 

15 : /cgn2_6/ptodata/2/pubpna/US10C_PUBCOMB . seq: * 

16: /cgn2_6/ptodata/2/pubpna/US10_NEW_PUB.seq: * 

17 : /cgn2_6/ptodata/2/pubpna/US60_NEW_PUB. seq: * 

18: /cgn2j5/ptodata/2/pubpna/US60_PUBCOMB. seq:* 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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7 


173 . 6 


15 


. 3 


1580 


9 


US- 09-8 80-107-217 6 


Sequence 217 6, Ap 




ft 

o 


173 6 


15 


3 


1 Sfi 0 

X >J <J VJ 


14 


US-10-2 88-222A-15 


Sequence 15, Appl 




Q 


141 4 

X *± X • *± 


12 


. 5 


1988 


15 


US-1 0-0 8 5- 19 8- 113 


Sequence 113, App 


c 


1 0 

X \J 


79.8 


7 


. 0 


360 


9 


Us-09-8 64-7 61-31375 


Sequence 31375, A 


c 


1 1 
x x 


7 9 R 


7 


o 


560 


9 


tjs-09-8 64-7 61-14 84 7 


Sequence 14847, A 


c 


X Z 


77 ft 




m 9 


310 


9 


us _ 09- 8 33- 3 8 1-3 17 


Sequence 317, App 






74 


6 


. 5 


401 


9 


Us- 09- 9 60- 3 52 -22 5 3 


Sequence 2253, Ap 




1 4 

X *± 


67 6 


5 


. 0 


972 


9 


Us- 09-7 3 8- 62 6- 2 554 


Sequence 2554, Ap 


c 


X o 


67 6 


6 


n 

• u 


3309400 


9 


Us- 09- 7 3 8- 62 6-1 


Sequence 1, Appli 




1 6 

X D 


64 4 




7 


i m 7 

X U X / 


9 


US-09-938-842A-380 


Sequence 38 0, App 




1 7 
X / 


64 4 


5 


. 7 


1017 


11 


US-09-938-842A-38 0 


Sequence 38 0, App 




1 p 


69 4 

DZ . H. 


s 




1 1 S9 

X X OZ. 


1 S 

X 


ttc_i 0-093-463-21 


Sequence 21, Appl 




1 Q 

X z) 


69 4 

OZ . *i 


-J 


. J 


1 1 S9 

X X J z 


15 


US- 10- 093- 4 63-2 5 


Sequence 25, Appl 




9 n 

Z U 


69 4 
DZ . ft 


-J 


• -J 


1117 

X O X / 


1 3 
x o 


US- 10- 09 1-62 8 -4 


Sequence 4, Appli 




9 1 
Z X 


69 4 


-J 


s 


1355 


15 


US- 10- 09 3-4 63-2 3 


Sequence 23, Appl 




99 
z z 


69 4 


s 

•J 


s 


1777 


13 


US- 10- 09 1-62 8-6 


Sequence 6, Appli 




9 ^ 
Z O 


69 4 
DZ . I 


R 


. «j 


9S90 

Z. J Z VJ 


1 S 

X — > 


tjs-10-108-2 60A-919 


Sequence 919, App 




9/1 

Z *i 


Sft 6 

3 O « D 




9 
. Z 


174 


9 


Us-O 9- 8 33-3 8 1-318 


Sequence 318, App 




9 R 
Z 3 


S4 9 
. z 


4 


Q 
. O 


4 07 


9 


US-0 9-9 60-352- 100 81 


Sequence 10081, A 




9 6 

Z D 


6 


4 


7 


ions 

X U V — 1 


9 


US- 09- 7 3 8- 62 6- 13 9 2 


Sequence 1392, Ap 




97 
z / 


6 


4 


# 7 


3309400 

J J W J T U u 


9 


US- 09-7 3 8- 62 6-1 


Sequence 1, Appli 




9 ft 
z o 


9 


4 


7 


418 


9 


US-0 9-960-3 52-4 473 


Sequence 4473, Ap 


c 


9 Q 
Z .? 


4 4 6 




Q 


1 97 

X J? J 


9 


US- 0 9- 8 64 -7 61- 3 012 8 


Sequence 30128, A 


c 




4 4 6 


O 


q 


60 0 


9 


uq- 09- 8 64-761-13589 


Sequence 13589, A 




O X 


4 "3 9 


-5 


ft 


1425 


10 


us- 09- 7 9 6- 7 5 3- 61 


Sequence 61, Appl 




^9 
o z 


49 6 


3 


. 8 


912 


9 


Us- 09- 974- 300- 682 3 


Sequence 6823, Ap 






4 9 4 

4Z • 4 




7 


4 79 


12 


tjs-1 0-424-599-41790 


Sequence 41790, A 


c 


"34 


"39 


-5 


4 


592 


10 


US- 09- 9 02 -563-5 


Sequence 5, Appli 


c 


O .J 


QQ 

O -7 


3 


m 4 


592 


14 


US- 10- 096-255-5 


Sequence 5, Appli 


c 


^6 


-5Q 


3 


p 4 


5403 


10 


Us- 09- 902 -5 63 -3 


Sequence 3, Appli 


<— 


37 


39 


3 


. 4 


5403 


14 


US-10-096-255-3 


Sequence 3, Appli 




^ft 


^7 ft 
o / . o 




• ^> 


65 


10 


Us- 09- 9 08- 97 5-2 662 9 


Sequence 26629, A 




?9 


17 ft 




3 


1845 


12 


US-10-282-122A-31931 


Sequence 31931, A 




40 


37.4 


3 


.3 


738 


12 


US-10-424-599-34013 


Sequence 34013, A 


c 


41 


36.8 


3 


.2 


3295 


12 


US-10-383-241B-5 


Sequence 5, Appli 




42 


36.6 


3 


.2 


1306 


12 


US- 10-424-599-7 9635 


Sequence 79635, A 




43 


36.4 


3 


.2 


1275 


14 


US-10-113-113-3 


Sequence 3, Appli 


c 


44 


36 


3 


.2 


2457 


15 


US-10-094-749-438 


Sequence 438, App 


c 


45 


36 


3 


.2 


2524 


10 


US-09-814-353-21076 


Sequence 21076, A 



ALIGNMENTS 



RESULT 1 
US-10-091-628-1 

; Sequence 1, Application US/10091628 
; Publication No. US20020164627A1 
; GENERAL INFORMATION : 

; APPLICANT: Wilganowski, Nathaniel L. 



; APPLICANT: Nepomnichy, Boris 
; APPLICANT: Burnett, Michael B. 
; APPLICANT: Hu, Yi 

; TITLE OF INVENTION: No. US20020164 627Alel Human Transporter Proteins and 

Polynucleotides Encoding the 

; TITLE OF INVENTION: Same 

; FILE REFERENCE: LEX-0314-USA 

; CURRENT APPLICATION NUMBER: US/10/091, 628 

; CURRENT FILING DATE: 2002-03-06 

; PRIOR APPLICATION NUMBER: US 60/275,009 

; PRIOR FILING DATE: 2001-03-12 

; PRIOR APPLICATION NUMBER: US 60/284,152 

PRIOR FILING DATE: 2001-04-17 
; NUMBER OF SEQ ID NOS : 6 

SOFTWARE: Fast SEQ for Windows Version 4.0 
; SEQ ID NO 1 

LENGTH: 1134 
TYPE: DNA 
; ORGANISM: Homo sapiens 
US-10-091-628-1 

Query Match 100.0%; Score 1134; DB 13; Length 1134; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 1134; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

1 ATGAGAGCCAATT GTT C CAGCAGCT CAGC CT GCCCT GC CAACAGTT CAGAGGAGGAGCT G 60 

I I I M I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I > I I I I I I I I I I I I I I I 

1 AT GAGAGC CAATT GT T C CAGCAGCT C AGCCT GC C CT G C CAACAGTT CAGAG GAGGAGCT G 60 

61 C CAGT GG GACT G GAG GT GC AT GGAAAC CT GGAGCT C GT T T T CAC AGT G GT GT C CACT GT G 120 

I i 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

61 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 120 

121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 180 

I | | | | I | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

121 ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 180 

181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 24 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
181 CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 240 

241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

241 TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 300 

301 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

301 CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 360 

361 GGAG AT AT GGAT CT CAGC AT CAGT AT GACAAC CT GT T C CAC CGTGGCCGC C CT GG GAAT G 420 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
361 GGAGAT AT GGAT CTCAG CAT CAGT AT GAC AACCT GT T C CAC CGTGGCCGCCCTG GGAAT G 42 0 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



Qy 

Db 



421 ATGCCACTCTGCATTTATCTCTACACCTGGTCCTGGAGTCTTCAGCAGAATCTCACCATT 480 

| I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
421 ATGCCACTCTGCATTTATCTCTACACCTGGTCCTGGAGTCTTCAGCAGAATCTCACCATT 48 0 



Qy 481 CCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTC 54 0 

I I I | | | | | | | | | | I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 4 81 CCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTC 540 

Qy 541 TAT GT GAAT T AC AGAT G GC C AAAAC AAT C C AAAAT CAT T CT C AAGAT TGGGGCCGTTGTT 600 

I I I I I M I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I M I I I 
Db 541 TAT GT GAAT T AC AG AT GGCC AAAAC AAT CC AAAAT CATTCT CAAGATT GGGGCC GTT GTT 600 

Qy 601 GGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGG 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 GGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGG 660 

Qy 661 AAT T CAG AC AT C AC C CT T CT GAC CAT C AGT TT CAT CTTTCCTTT GAT T GGC CAT GT C AC G 720 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 661 AAT T CAGAC AT CAC CCT T CT GAC CAT CAGT T T CAT CTTTCCTTT GAT T GGC CAT GT C ACG 720 

Qy 721 GGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTA 780 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 721 GGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTA 780 

Qy 7 81 GAAACT GGAGCT CAGAATATT CAGAT GTGCAT CAC CAT GCT CCAGTT AT CTTTCACT GCT 840 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 781 GAAACT G GAG CT C AGAAT AT T CAGAT GT GC AT CAC CAT GCT C CAGT TAT CT T T C ACT GCT 840 

Qy 841 GAGCACTT GGT CCAGAT GT T GAGT TTCCCACT GGC CTAT GGACT CTT CCAGCTGATAGAT 900 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 841 GAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGAT 900 

Qy 901 GGATTTCTTATTGTTGCAGCATATCAGACGTACAAGAGGAGATT GAAGAACAAACAT GGA 960 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 901 GGAT T T CT T AT T GT T G CAG CAT AT CAGAC GT AC AAGAG GAGAT T GAAGAACAAACAT GGA 960 

Qy 961 AAAAAGAACT CAG GTT G C ACAGAAGT CT GCCAT ACGAGGAAAT C GACT T CT T C CAGAGAG 1020 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 961 AAAAAGAACT CAGGTT GCACAGAAGT CTGCCATACGAGGAAAT CGACTTCTTCCAGAGAG 1020 

Qy 1021 ACCAATGCCTTCTTGGAGGTGAATGAAGAAGGTGCCATCACTCCTGGGCCACCAGGGCCA 108 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1021 ACCAATGCCTTCTTGGAGGTGAATGAAGAAGGTGCCATCACTCCTGGGCCACCAGGGCCA 108 0 

Qy 1081 AT GGAT T GC CAC AG G G C T CT C GAGC CAGT T GGC CAC AT C ACT T CAT GT GAAT AG 1134 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1081 ATGGATTGCCACAGGGCTCTCGAGCCAGTTGGCCACATCACTTCATGTGAATAG 1134 



RESULT 2 
US-10-091-628-3 

; Sequence 3, Application US/10091628 
; Publication No. US20020164 627A1 
; GENERAL INFORMATION: 

APPLICANT: Wilganowski, Nathaniel L. 
; APPLICANT: Nepomnichy, Boris 
; APPLICANT: Burnett, Michael B . 
; APPLICANT: Hu, Yi 

; TITLE OF INVENTION: No. US20020164627Alel Human Transporter Proteins and 
Polynucleotides Encoding the 
; TITLE OF INVENTION: Same 



; FILE REFERENCE: LEX-03 14-USA 

; CURRENT APPLICATION NUMBER: US/10/091, 628 

; CURRENT FILING DATE: 2002-03-06 

; PRIOR APPLICATION NUMBER: US 60/275,009 

; PRIOR FILING DATE: 2001-03-12 

; PRIOR APPLICATION NUMBER: US 60/284,152 

; PRIOR FILING DATE: 2001-04-17 

; NUMBER OF SEQ ID NOS : 6 

; SOFTWARE: FastSEQ for Windows Version 4.0 
; SEQ ID NO 3 

LENGTH: 1600 

TYPE: DNA 
; ORGANISM: Homo sapiens 
US-10-091-628-3 

Query Match 100.0%; Score 1134; DB 13; Length 1600; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 1134; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 



Qy 


-i 
l 


ATGAGAGCLAA1 lull CLAGL-ACjL. 1 UAtjUU 1 CjUUU 1 bLtmUAb 1 1 u/\u/\uu/\u^UjU i u 


(SO 




i i i i i i i i i i i i i i t i i i I I I I i I I I l l l l l l l 1 1 1 1 1 i 1 ! 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 

1 I 1 II I 1 1 1 I! 1 II 1 1 1 1 1 II 1 1 1 II II 1 II 1 II II M I 1 1 M 1 1 M 1 II 1 1 1 M I 1 1 1 1 




Db 


194 


ATGAGAGCCAATTGTTCCAGCAGCTCAGCCTGCCCTGCCAACAGTTCAGAGGAGGAGCTG 


253 


Qy 


61 


CLAG1 CjGCjAC 1 (jCjAajvj! bLAl bbAAALLl LiLlMAjUI tbl 111 I 1 \j L v^l^T-V^l \j 1 ij 


120 




I 1 f 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 

II | I I I I 1 1 1 1 1 1 1 1 1 1 II 1 I 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 II 1 M 1 1 1 1 




Db 


254 


CC AGT GGGACT GGAG GT G CAT GGAAAC CT G GAGCT C GT T TT CAC AGT GGT GT CC ACT GT G 


313 


Qy 


1Z 1 




180 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 t 1 M 1 1 M 1 M 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
| I I I I I I M 1 1 M 1 1 II 1 1 M M M 1 M M 1 1 1 1 1 M 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


314 


ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 


373 


Ov 


181 


CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 


240 




1 1 1 1 1 II 1 1 1 1 M 1 1 1 1 1 1 1 1 1 M M 1 1 1 1 M 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 




Db 


374 


CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 


433 


Qy 


241 


TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 


300 




I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


434 


TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 


493 


Qy 


301 


CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 


360 




1 I | I I II 1 1 1 1 1 1 1 II 1 III 1 1 M 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 




Db 


494 


CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 


553 


Qy 


361 


GG AGAT AT G GAT CT C AGC AT C AGT AT GAC AAC CT GT T C CAC CGTGGCCGCCCTG GGAAT G 


420 




I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 M 1 1 1 1 1 1 M 1 1 1 1 M 1 1 1 1 1 




Db 


554 


GGAGATATGGAT CT CAGCAT CAGTAT GACAACCT GTTCCACCGT GGCCGCCCTGGGAATG 


613 


Qy 


421 


ATGCCACTCTGCATTTATCTCTACACCTGGTCCTGGAGTCTTCAGCAGAATCTCACCATT 


480 




I M | | | | 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


614 


AT G CC ACT C T GCATTT AT CT CTACAC CT GGT C CT GGAGT CT T CAGC AGAAT CT CAC CATT 


673 


Qy 


481 


C CT T AT CAGAACAT AG GAAT T AC C CT T GT GT GC CTGACCATT C CT GT GGC CTTT GGT GT C 


540 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


674 


CCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTC 


733 


Qy 


541 


TAT GT GAAT T AC AGAT GGC C AAAAC AAT C C AAAAT CAT T CT C AAGAT TGGGGCCGTTGTT 


600 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 



Db 


734 


TAT GT GAATT ACAGAT GGC CAAAAC AAT C C AAAAT CAT T C T CAAGAT T GGGGC C GTT GTT 


793 


Ov 


601 


GGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGG 


660 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 M 1 




Db 


794 


GGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGG 


853 




661 


AAT T C AG AC AT C AC C C T T C T G AC CAT C AGT T T CAT CTTTCCTTT GAT T G G C CAT G T C AC G 


720 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


854 


AAT T C AG AC AT C AC C C T T C T G AC CAT C AGT T T CAT CTTTCCTTT GAT T G G C CAT GT C AC G 


913 




721 


GGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTA 


780 




I I I I 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


914 


GGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGACAATTTCCTTA 


973 




781 


GAAACT GGAGCT C AGAAT AT T CAGAT GT GC AT C ACCAT GC T C CAGTT AT CT TT CACT GCT 


840 




II | | | M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 II 1 1 1 1 1 1 II 1 1 1 1 1 1 1 M 1 1 1 1 




Db 


974 


GA7^ACTGGAGCTCAGAATATTCAGATGTGCATCACCATGCTCCAGTTATCTTTCACTGCT 


1033 


wy 


R4 1 

U 1 _L 


GAG CACTTGGT C CAGAT GT T GAGT TT C CC ACT G GCCT AT G GACT CT TCCAGCT GAT AGAT 


900 




I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 




Db 


1034 


GAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGAT 


1093 


v!Y 


901 


GGAT T T CT T AT T GT T G C AG C AT AT C AGAC GT AC AAGAGGAGAT T GAAGAAC AAAC AT G GA 


960 




I I I 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1094 


GGATT T CTTATT GTT GCAGCAT ATCAGACGTACAAGAGGAGATT GAAGAACAAACATGGA 


1153 


Ox/ 

yy 


.7 \J X 


AAAAAGAACT CAG GT T GCACAGAAGT CT GC CAT AC GAGGAAAT C GACTTCTT C CAGAGAG 


1020 




I I I I 1 M 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 1 1 1 II 




Db 


1154 


AAAAAGAACT CAGGT T GCACAGAAGT CT G C CAT AC GAGGAAAT CGACTT CTT C CAGAGAG 


1213 


Ov 

wy 


1021 


AC CAAT GC CTT CT T GGAG GT GAAT GAAGAAGGT GC CAT CACT C CT GGGC CAC CAGG GC C A 


1080 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 II 1 1 1 1 II 1 1 1 II 1 1 1 1 1 1 1 1 1 
i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i I i i \ i i i » 1 i i i i i > i i i i i i i i i i i i i i 




Db 


1214 


ACCAATGCCTTCTTGGAGGTGAATGAAGAAGGTGCCATCACTCCTGGGCCACCAGGGCCA 


1273 


Qy 


1081 


AT GGATTGCCACAGGGCT CT CGAGCCAGTT GGC CACAT CACTT CAT GT GAAT AG 1134 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1274 


AT GGAT T GC CAC AGGGCT CT CGAGC CAGT T GGC CACAT CACT T CAT GT GAAT AG 1327 





RESULT 3 

US-09-981-151A-11 

Sequence 11, Application US/09981151A 
Publication No. US20030212256A1 
GENERAL INFORMATION: 
APPLICANT: Edinger, Shlomit R 
APPLICANT: Gerlach, Valerie 
APPLICANT: MacDougall, John R 
APPLICANT: Malyankar, Muriel M 
APPLICANT: Smithson, Glennda 
APPLICANT: Millet, Isabelle 
APPLICANT: Peyman, John A 
APPLICANT: Stone, David J 
APPLICANT: Gunther, Erik 
APPLICANT: Ellerman, Karen 
APPLICANT: Shimkets, Richard A 
APPLICANT: Padigaru, Muralidhara 
APPLICANT: Guo, Xiaojia 



APPLICANT: Patturajan, Meera 
APPLICANT: Taupier Jr, Raymond J 
APPLICANT: Burgess, Catherine E 
APPLICANT: Zerhusen, Bryan D 
APPLICANT: Kekuda, Ramesh 
APPLICANT: Spytek, Kimberly A 
APPLICANT: Gangolli, Esha A 
APPLICANT: Fernandes, Elma R 
APPLICANT: Gorman, Linda 

TITLE OF INVENTION: Proteins and Nucleic Acids Encoding Same 
FILE REFERENCE: 21402-168 

CURRENT APPLICATION NUMBER: US/09/981, 151A 
CURRENT FILING DATE: 2001-10-16 
PRIOR APPLICATION NUMBER: 60/241,040 
PRIOR FILING DATE: 2000-10-17 
PRIOR APPLICATION NUMBER: 60/241,058 
PRIOR FILING DATE: 2000-10-17 
PRIOR APPLICATION NUMBER: 60/241,063 
PRIOR FILING DATE: 2000-10-17 
PRIOR APPLICATION NUMBER: 60/241,243 
PRIOR FILING DATE: 2000-10-17 
PRIOR APPLICATION NUMBER: 60/242,152 
PRIOR FILING DATE: 2000-10-20 
PRIOR APPLICATION NUMBER: 60/242,482 
PRIOR FILING DATE: 2000-10-23 
PRIOR APPLICATION NUMBER: 60/242,611 
PRIOR FILING DATE: 2000-10-23 
PRIOR APPLICATION NUMBER: 60/242,612 
PRIOR FILING DATE: 2000-10-23 
PRIOR APPLICATION NUMBER: 60/242,880 
PRIOR FILING DATE: 2000-10-24 
PRIOR APPLICATION NUMBER: 60/242,881 
PRIOR FILING DATE: 2000-10-24 

Remaining Prior Application data removed - See File Wrapper or PALM. 
NUMBER OF SEQ ID NOS : 160 
SOFTWARE: PatentlnVer. 2.1 
SEQ ID NO 11 
LENGTH: 987 
TYPE : DNA 

ORGANISM: Homo sapiens 
US-09-981-151A-11 

Query Match 57.8%; Score 655.8; DB 11; Length 987; 

Best Local Similarity 86.3%; Pred. No. 4.5e-205; 

Matches 803; Conservative 0; Mismatches 77; Indels 51; Gaps 5; 

Qy 1 ATGAGAGCCAATTGTTCCAGCAGCTCAGCCTGCCCTGCCAACAGTTCAGAGGAGGAGCTG 60 

| | | | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I M 
Db 1 ATGAGAGCCAATTGTTCCAGCAGCTCAGCCTGCCCTGCCAACAGTTCAGAGGAGGAGCTG 60 

Qy 61 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTG 120 

I I I I I I I I I II I I I I I I I I I I II I I I I I I I M I I I I I I I I I M I I I I M I I I I I I I I I 

Db 61 CCAGTGGGACTGGAGGTGCATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTATC 120 



Qy 

Db 



121 
121 



ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 180 

| | I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

ATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCG 180 



Qy 


181 


Db 


181 


Qy 


241 


Db 


241 


Qy 


301 


Db 


301 


Qy 


361 


Db 


360 


Qy 


421 


Db 


391 


Qy 


481 


Db 


451 


Qy 


529 


Db 


511 


Qy 


589 


Db 


567 


Qy 


649 


Db 


625 


Qy 


709 


Db 


685 


Qy 


769 


Db 


745 


Qy 


Q O Q 


Db 


802 


Qy 


889 


Db 


862 



CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 240 

I I I I I | | | | | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

CACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCT 240 

TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 300 

I | | | | | | | | | | | | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I 

TTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTT 300 

CTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 360 

I I I I I I M M I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

CTCATCATGGGCTGCTG-CCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGAT 359 

GGAGATATGGATCTCAGCATCAGTATGACAACCTGTTCCACCGTGGCCGCCCTGGGAATG 420 

II I I I I I I I I I I I I I I I I I I I I I I 

GGAGAT AT G GAT C T C A GGTGCCCTGGGAATG 390 

AT G C CAC T CT GC AT TT AT CT CT ACACC T G GT C CT G G AGT CT T C AGC AGAAT CT C AC CAT T 4 80 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
AT GCCACT CT GC ATT T AT CT CT ACAC CT GGT C C T GGAGT CT T CAGC AGAAT CT CAC C ATT 450 

C CT TAT CAGAAC A- - TAGGAATTACCCTTGTGTGCCTGACCATTCCTGTG 52 8 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 

CCTTATCAGAACATAGGTCTGTCTTTAGGAATTACCCTTGTGTGCCTGACCATTCCTGTG 510 

GCCTTTGGTGT C T AT GT GAAT T AC AG AT G G C C AAAAC AAT C C AAAAT CAT T CT C AAGAT T 58 8 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

GC CTT T GGT GT CT AT GT GAATT ACAGAT GGCCAAAAC AAT C CAAAAT CAT T CT CAA 566 

GGGGCCGTTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCG 64 8 

II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 

— GGCCGTTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCG 624 

AAAGGAT CT T G GAAT T C AGAC AT CAC C CT T CT GAC CAT C AGT T T CAT CTTTCCTTT GAT T 7 08 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

AAAGGATCTTGGAATTCAGACATCACCCTTCTGACCATCAGTTTCATCTTTCCTTTGATT 68 4 

GGCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGG 7 68 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

GGCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGACCTTG 7 4 4 

ACAATTTCCTT AGAAACT GGAGCT CAGAATATT CAGAT GT GCAT CAC CAT GCTCCAGTT A 828 

I I I I I I I I I I I I I I I I I M I I 

CCTATCTTTTTAG GT TT AGCTT T CAAGACACC CT GT GAT AC C CT ACT C G CAAT GACT 8 01 

TCTTTCACTGCTGAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTC 88 8 

M | Ml | Ml I I I I I I I I I I I I I I I I I I 

TCGTGTCCTGAATGTTCCAGGCTCATCTATGCCTTCATTCCTCTGCTATATGGACTCTTC 8 61 

C AGCT GAT AGAT GGATT T CTT AT T GTT G CAG 919 

I I I I M I I I I I I I I I I I I I I I I I I I I I I II 

CAGCT GAT AGATGGATTT CTTATT GTT GAAG 892 



RESULT 4 

US-10-191-997-110 

; Sequence 110, Application US/10191997 



Publication No. US20030207834A1 
GENERAL INFORMATION: 
APPLICANT: Oligos Etc., Inc. 
APPLICANT: DALE, Roderic M. K. 
APPLICANT: ARROW, Amy 
APPLICANT: THOMPSON, Terry 

TITLE OF INVENTION: Oligonucleotide-Containing Pharmacological Compositions 
And Their Use 

FILE REFERENCE: 54800-5019 

CURRENT APPLICATION NUMBER: US/10/191, 997 
CURRENT FILING DATE: 2002-07-10 
PRIOR APPLICATION NUMBER: US 60/303,820 
PRIOR FILING DATE: 2001-07-10 
NUMBER OF SEQ ID NOS : 132 
SOFTWARE: Patentln version 3.1 
SEQ ID NO 110 
LENGTH: 3779 
TYPE : DNA 

ORGANISM: Homo sapiens 
FEATURE : 

NAME /KEY: misc_feature 

OTHER INFORMATION: IBAT: Acc. No. US20030207834A1 NM_000452 
US-10-191-997-110 

Query Match 26.3%; Score 297.8; DB 15; Length 3779; 

Best Local Similarity 58.5%; Pred. No. 1.3e-86; 

Matches 518; Conservative 0; Mismatches 367; Indels 0; Gaps 0; 

Qy 8 0 ATGGAAACCTGGAGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGT 139 

|| | | | | III II I I I I II I I I I I I I I I I I I I . 

Db 678 AT AACAT CCTAAGT GT GGT CCTAAGT ACGGT GCT GACCAT CCT GTT GGCCTT GGT GAT GT 737 

Qy 140 TCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGG 199 

MM Mill I M M M I I I M M I I 

D b 738 TCTCCATGGGATGCAACGTGGAAATCAAGAAATTTCTAGGGCACATAAAGCGGCCGTGGG 797 

Q y 2 00 GCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGG 259 

I MM I II M I II I I II M II I II I I I I I II I Mill 

Db 7 98 GC ATT T GT GTT GGC T T C CT CT GT C AGT T T GGAAT CAT GC C C CTC ACAGGAT T CAT CCT GT 857 

Qy 2 60 CCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCC 319 

| | MM I II II I I I I II I I I I I I I I 

Db 858 CGGTGGCCTTTGACATCCTCCCGCTCCAGGCCGTAGTGGTGCTCATTATAGGATGCTGCC 917 

Q y 320 CGGGGGGCACCAT CTCT AACATTTT CAC CTT CT GGGTT GAT GGAGAT AT GGAT CTCAGCA 379 

I I I II II II I I I I I II III M II I II I I I I I I I I I I I I I M 
D b 918 CTGGAGGAACTGCCTCCAATATCTTGGCCTATTGGGTCGATGGCGACATGGACCTGAGCG 977 

Qy 380 T CAGT AT GACAAC CT GT T C CAC CGTGGCCGCCCT GG GAAT GAT GC C ACT CT GC AT T TAT C 439 

|| || M I II I I M I I I I I M II I I I I I I I I M I II I II I II I I 
Db 978 TCAGCATGACCACATGCTCCACACTGCTTGCCCTCGGAATGATGCCGCTGTGCCTCCTTA 1037 

Qy 440 T CT ACAC CT GGT C CT GGAGT CT T CAGC AGAAT CT CAC CATT CCT TAT C AGAAC AT AG GAA 4 99 

M I I I II Ml I I M II II I I M I II II I I II I 

Db 1038 TCTATACCAAAAT GTGGGT CGACTCT GGGAGCAT CGTAATT CCCTAT GAT AACAT AG GT A 1097 



500 TTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGC 559 



1 1 1 1 1 1 1 II 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 



Db 


1098 


CATCTCTGGTTGCTCTCGTTGTTCCTGTTTCCATTGGAATGTTTGTTAATCACAAATGGC 


1157 


Qy 


560 


CAAAACAATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCTTCTGG 

I I | | | | | I 1 1 1 1 1 1 1 1 1 1 1 1 II 1 II 1 1 III II II 1 1 1 1 1 

CCCAAAAAGCAAAGATCATACTTAAAATTGGGTCCATCGCGGGCGCCATCCTCATTGTGC 


619 


Db 


1158 


1217 


Qy 


620 


TGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCTTC 

1 1 1 1 1 1 1 1 1 1 1 1 II III 1 1 1 1 1 1 M II 

T CAT AG CT GT GGT T G GAGGAAT AT T GT AC C AAAGC GC CT G GAT C ATT G CT C C C AAACT GT 


679 


Db 


1218 


1277 


Qy 


680 


TGACCATCAGTTTCATCTTTCCTTTGATTGGCCATGTCACGGGTTTTCTGCTGGCACTTT 

II II | 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 M ! 1 1 1 1 1 M 

GGATTATAGGAACAATATTTCCTGTGGCGGGTTACTCCCTGGGGTTTCTTCTGGCTAGAA 


739 


Db 


1278 


1337 


Qy 


740 


TT ACCCACCAGTCTT GGCAAAGGT GCAGGACAATTT C CTT AGAAACT GGAGCT CAGAAT A 

1 1 1 1 1 1 1 1 1 1 1 II UN M 1 1 1 1 1 1 

TTGCTGGTCTACCCTGGTACAGGTGCCGAACGGTTGCTTTTGAAACGGGGATGCAGAACA 


799 


Db 


1338 


1397 


Qy 


800 


T T C AGAT GT GC AT CAC CAT GCT C C AGT T AT CT T T C ACT GCT GAGCACT T GGT C C AG AT GT 
I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 II 
C GCAG CT AT GTT CCAC CAT C GT T C AGCT CT CCT T C ACT C CT GAGGAGCT CAAT GT C GT AT 


859 


Db 


1398 


1457 


wy 


ft 60 

O \J w 


T GAGT T T C C CACT GGC CT AT GGACT C TT C CAGCT GAT AGAT GGAT T T C T TAT T GT T GCAG 

II 1 1 1 1 1 1 1 III 1 1 1 1 1 1 1 1 1 1 M 1 1 Ml 

TCACCTTCCCGCTCATCTACAGCATTTTCCAGCTCGCCTTTGCCGCAATATTCTTAGGAT 


919 


Db 


1458 


1517 


Qy 


920 


CAT AT CAGACGTACAAGAGGAGATT GAAGAACAAACAT GGAAAAA 964 

Ml 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 III 

TT TAT GTGGCATACAAGAAAT GT CAT GGAAAAAACAAGGCAGAAA 1562 




Db 


1518 





RESULT 5 

US-09-917-800A-162 6 

; Sequence 1626, Application US/09917800A 

; Patent No. US20020119462A1 

; GENERAL INFORMATION: 

; APPLICANT: Mendrick, Donna 

; APPLICANT: Porter, Mark 

; APPLICANT: Johnson, Kory 

; APPLICANT: Castle, Arthur 

; APPLICANT: Elashoff, Michael 

; APPLICANT: Gene Logic, Inc. 

; TITLE OF INVENTION: Molecular Toxicology Modeling 

; FILE REFERENCE: 44921-5038-US 

; CURRENT APPLICATION NUMBER: US/09/917, 800A 

; CURRENT FILING DATE: 2001-07-31 

; PRIOR APPLICATION NUMBER: US 60/222,040 

PRIOR FILING DATE: 2000-07-31 
; PRIOR APPLICATION NUMBER: US 60/222,880 
; PRIOR FILING DATE: 2000-11-02 
; PRIOR APPLICATION NUMBER: US 60/290,029 
; PRIOR FILING DATE: 2001-05-11 
; PRIOR APPLICATION NUMBER: US 60/290,645 
; PRIOR FILING DATE: 2001-05-15 
; PRIOR APPLICATION NUMBER: US 60/292,336 
; PRIOR FILING DATE: 2001-05-22 



; PRIOR APPLICATION NUMBER: US 60/295,798 
; PRIOR FILING DATE: 2001-06-06 
; PRIOR APPLICATION NUMBER: US 60/2 97,457 
; PRIOR FILING DATE: 2001-06-13 
; PRIOR APPLICATION NUMBER: US 60/298,884 
; PRIOR FILING DATE: 2001-06-19 
; PRIOR APPLICATION NUMBER: US 60/303,459 
; PRIOR FILING DATE: 2001-07-09 
; NUMBER OF SEQ ID NOS : 174 0 
; SOFTWARE: Patentln Ver. 2.1 
; SEQ ID NO 1626 
LENGTH: 1663 
; TYPE: DNA 

; ORGANISM: Rattus norvegicus 
FEATURE : 

OTHER INFORMATION: Genbank Accession No. US20020119462A1 NM__017047 
US-09-917-8 00A-1626 

Query Match 16.2%; Score 183.2; DB 9; Length 1663; 

Best Local Similarity 53.6%; Pred. No. 5.1e-49; 

Matches 430; Conservative 0; Mismatches 363; Indels 9; Gaps 2; 

Qy 119 TGATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGT 178 

I I I I I I III I I I I I I I I I I I I I I I II MM M I I I I I I 

Db 219 T AAT GT T GCT GCT TAT CAT GCT CT C ACT GGGCT GCACCAT GGAATT CAGCAAGAT CAAGG 27 8 

Qy 179 CGCACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGC 238 

I I I I I Ml III II I Ml I I I I I I I II II I II M I I 

Db 27 9 CTCACTTGTGGAAGCCCAAAGGGGTGATCGTTGCCTTGGTGGCCCAGTTTGGCATCATGC 338 

Qy 239 CTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTG 298 

I | I II I I II I M I II I I I II I II I I II M I I I I 

Db 339 CCCTCGCTGCTTTTCTTCTCGGCAAGATCTTTCACCTGAGCAACATTGAAGCTCTGGCCA 398 

Qy 299 TTCTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTG 35 8 

I I M I I I I II I I I I I I II I II I I I I I I I I I II I M I I I 

Db 399 TCCTCATCTGTGGCTGCTCTCCCGGGGGGAACTTGTCCAACCTCTTCACCCTGGCCATGA 458 

Qy 359 AT GGAGAT AT GGAT CT CAG CAT C AGT AT GAC AAC C T GT T C C AC CGTGGCCGCC CT GG GAA 418 

I II M III I I I I I II M I II M I Mill I I I I I I Ml I I I I I 

Db 459 AGGGGGACATGAAC CTCAGCAT CGT GAT GAC CACCT GCT CCAGCTT CAGT GCCTT GGGCA 518 

Qy 419 TGATGCCACTCTGCATTTATCTCTACACC T GGT CCT GGAGT CT T CAGCAGAAT CT C A 475 

I I I II I II I I I I I I I I II I I I I I M I II II II 

Db 519 T GAT G C C ACTC CT CT T AT ACGT CT ACAGCAAAG GC AT CT AC GAT G GAGAC CT TAAGGACA 57 8 

Qy 47 6 CCATTCCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTG 535 

Mill I III II I I I I I I I I M I M I I I I 

Db 57 9 AG GT G C CCT ACAAAG GCAT TAT GAT AT C ACT AGT CAT AGT T CT CAT T C C T T GC AC CAT AG 638 

Qy 536 GT GT CT AT GT GAATT ACAGAT GGCCAAAACAAT C CAAAAT CAT T CT C AAGAT T GGGGC C G 595 

III I I I I II I I II I I I I II I II M II M 

Db 639 GGATCGTCCTCAAGTCCAAAAGGCCACACTATGTACCCTACATCCTCAAGGGAGGCATGA 698 

Qy 596 TTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGAT 655 

II I I II I I I I I I II I I I Mill II 

Db 699 TCATCACCTTCCTCCTCTCTGTGGCTGTCACAGCCCTCTCTGTCATCAATGTGGGCAACA 75 8 



Qy 656 C T T G G AAT T C AG AC AT C AC CCTTCTGACCATCAGTTTCATCTTTCCTTTGATTG 709 

| | | | I I I I I I I I I I I I Ml I I I I I II 

D b 759 GCATCATGTTCGTCATGACACCACACTTACTGGCTACCTCCTCCCTGATGCCCTTCTCTG 818 

Qy 710 GCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCA/^AGGTGCAGGA 7 69 

I I I I I I I I I I I I I I II I I I I I I I I I 

Db 819 GCTTTCTGATGGGTTACATTCTCTCTGCTCTCTTCCAACTCAATCCAAGCTGCAGACGCA 878 

Qy 770 C AAT T T C CT T AGAAACT G GAGCT C AGAAT AT T C AGAT GT GC AT C AC CAT G CT C C AGT T AT 829 

| | | I I I I I I I I I I II II I II II I I I I I I I I I I I I I 

Db 87 9 C CAT CAGC AT GGAAAC AG G ATTC CAAAACAT T CAACT CT GT T CT ACC AT C CT CAAT GT GA 938 

Qy 830 CTTTCACTGCTGAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTCC 889 

| | | I I MM III I I M II I I I I Ml I I I I I 

Db 939 CCTTCCCCCCTGAAGTCATTGGGCCACTTTTCTTCTTTCCTCTCCTCTACATGATTTTCC 998 

Qy 890 AGCT GAT AGAT G GAT T T CT TAT 911 

II I I I I I I I I I M I I I 

Db 999 AGCTTGCAGAAGGACTTCTCAT 102 0 



RESULT 6 

US-10-388-934-263 

; Sequence 263, Application US/10388934 

; Publication No. US2004 0005547A1 

; GENERAL INFORMATION: 

; APPLICANT: Boess, Franziska 

; APPLICANT: Suter-Dick, Laura 

; APPLICANT: Wolf, Detlef 

; TITLE OF INVENTION: BIOMARKERS AND EXPRESSION PROFILES FOR TOXICOLOGY 
; FILE REFERENCE: 21199 

; CURRENT APPLICATION NUMBER: US/10/388, 934 

; CURRENT FILING DATE: 2003-03-14 

; PRIOR APPLICATION NUMBER: 02005336.9 

; PRIOR FILING DATE: 2002-03-14 

; PRIOR APPLICATION NUMBER: 02015657.6 

; PRIOR FILING DATE: 2002-07-17 

; NUMBER OF SEQ ID NOS : 862 

SOFTWARE: Patentln version 3.1 
; SEQ ID NO 263 
; LENGTH: 1663 
TYPE : DNA 

; ORGANISM: Rattus norvegicus (No. US20040005547Alway rat) 
US-10-388-934-263 

Query Match 16.2%; Score 183.2; DB 15; Length 1663; 

Best Local Similarity 53.6%; Pred. No. 5.1e-49; 

Matches 430; Conservative 0; Mismatches 363; Indels 9; Gaps 2 
Q y H9 TGATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGT 178 

I Ml II I I I II M I M I I MUM II II I I I I I I M I I 

Db 219 T AAT GT T GCT GCT TAT C AT GCT CT C ACT GGG CT G CAC CAT GGAAT T CAGCAAGAT CAAGG 27 8 

179 CGCACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGC 238 

| M I I II I III II I Ml II I I II II I I 

279 CTCACTTGTGGAAGCCCAAAGGGGTGATCGTTGCCTTGGTGGCCCAGTTTGGCATCATGC 338 



Qy 

Db 



Q y 239 CTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTG 298 

I | | | M I I I I I I I M I I I I I I I I I I I I I I I I I I 

Db 33 9 CCCTCGCTGCTTTTCTTCTCGGCAAGATCTTTCACCTGAGCAACATTGAAGCTCTGGCCA 398 

Qy 299 TTCTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTG 35 8 

I I I I I I I I I I I II I II Mill I I I II Ml I llllll I I 

D b 399 TCCTCATCTGTG.GCTGCTCTCCCGGGGGGAACTTGTCCAACCTCTTCACCCTGGCCATGA 458 

Qy 359 AT GGAGAT AT GGAT CT CAGC AT CAGT AT GAC AACCT GTT C CAC CGT GGC C GC C CT GGGAA 418 

I || II I I I I I II I I I I I I I I II I I I I I I I Ml MM I 

Db 459 AGGGGGACAT GAACCTCAGCAT CGTGAT GAC CAC CT GCT CCAGCTT CAGT GC CTT GGGCA 518 

Qy 419 TGATGCCACTCTGCATTTATCTCTACACC T GGT C CT GGAGT CT T CAGCAGAAT CT C A 47 5 

I I I M II II I I I I II I II I II I I II I II M M 

Db 519 T GAT GC CACT C CT CT TAT AC GT CT ACAGCAAAGGC AT CT AC GAT G GAGAC CT T AAGGAC A 57 8 

Qy 47 6 CCATTCCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTG 535 

I II M I Ml II I II II I I II II II II I I 

D b 579 AGGT GCC CT ACAAAGGCATT AT GAT AT CACTAGT CAT AGTT CTCATT C CTT GCAC CAT AG 63 8 

Qy 536 GT GT CT AT GT G AAT T ACAGAT GGC C AAAACAAT C C AAAAT CAT T CT CAAGAT TGGGGCCG 595 

I M I II I II I M I II I I II I M M II II 

Db 639 G GAT C GT C CT CAAGT CCAAAAG GCC ACACT AT GT AC C CT AC AT C CT CAAGGGAGGCAT GA 698 

Qy 596 TTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGAT 655 

II Mill I III III III II I II II 

Db 699 TCATCACCTTCCTCCTCTCTGTGGCTGTCACAGCCCTCTCTGTCATCAATGTGGGCAACA 758 

Qy 656 C T T G G AAT T C AG AC AT CAC CCTTCTGACCATCAGTTTCATCTTTCCTTTGATTG 709 

| | | I I I || I I I I I I I I III I II M M 

Db 759 GCATCATGTTCGTCATGACACCACACTTACTGGCTACCTCCTCCCTGATGCCCTTCTCTG 818 

Qy 710 GCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGA 7 69 

I I I I I I I I II Mil II M I I I I I I I 

Db 819 GCTTTCTGATGGGTTACATTCTCTCTGCTCTCTTCCAACTCAATCCAAGCTGCAGACGCA 878 

Qy 770 C AAT T T C CT T AGAAACT G GAG CT CAGAAT AT T C AGAT GT G CAT CAC CAT G CT C CAGT TAT 82 9 

Ml I I I Ml I I II I I II M II I III M II I II I I I 

Db 879 CCAT CAGCAT GGAAAC AGGAT T C CAAAACAT T CAACT CT GT T CT ACCAT C CT CAAT GT GA 938 

Qy 830 CTTTCACTGCTGAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTCC 889 

Mill MM III I M I MUM Ml II I I I 

Db 939 CCTTCCCCCCTGAAGTCATTGGGCCACTTTTCTTCTTTCCTCTCCTCTACATGATTTTCC 998 

Qy 890 AG CT GAT AGAT GGAT T T CT T AT 911 

I II I II I III II II M 
Db 999 AGCTT GCAGAAGGACTT CT CAT 102 0 



RESULT 7 

US-09-880-107-2176 

; Sequence 2176, Application US/09880107 

; Patent No. US20020142981A1 

; GENERAL INFORMATION: 

; APPLICANT: Home, Darci T. 

; APPLICANT: Vockley, Joseph G. 



APPLICANT: Scherf, Uwe 
APPLICANT: Gene Logic, Inc. 

TITLE OF INVENTION: Gene Expression Profiles in Liver Cancer 
FILE REFERENCE: 44921-5028-WO 
CURRENT APPLICATION NUMBER: US/ 09/ 880, 107 
CURRENT FILING DATE: 2001-06-14 
PRIOR APPLICATION NUMBER: US 60/211,379 
PRIOR FILING DATE: 2000-06-14 
PRIOR APPLICATION NUMBER: US 60/237,054 
PRIOR FILING DATE: 2000-10-02 
NUMBER OF SEQ ID NOS : 3950 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 2176 
LENGTH: 1580 
TYPE : DNA 

ORGANISM: Homo sapiens 
FEATURE: 

OTHER INFORMATION: Genbank Accession No. US20020142981A1 L21893 
US-09-880-107-2176 

Query Match 15.3%; Score 173.6; DB 9; Length 1580; 

Best Local Similarity 51.9%; Pred. No. 7.2e-46; 

Matches 445; Conservative 0; Mismatches 404; Indels 9; Gaps 2; 

Qy 119 TGATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGT 178 

I I I I I I I I I! I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 180 TCATGTTGTTCTTCATCATGCTCTCGCTGGGCTGCACCATGGAGTTCAGCAAGATCAAGG 23 9 

Qy 179 CGCACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGC 238 

I IN I III II II III II IN I I I I I I I I I I I I I I 

Db 24 0 CT CACTT AT GGAAGC CTAAAGGGCT GGCCAT CGCCCT GGT GGCACAGTAT GGCAT CAT GC 299 

Qy 239 CTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTG 298 

I I I I I I I I I I I I I I I IN I I I I I I I I I I IN 

Db 300 CCCTCACGGCCTTTGTGCTGGGCAAGGTCTTCCGGCTGAAGAACATTGAGGCACTGGCCA 359 

Qy 299 TTCTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCT7\ACATTTTCACCTTCTGGGTTG 358 

I I II II I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 360 TCTTGGTCTGTGGCTGCTCACCTGGAGGGAACCTGTCCAATGTCTTCAGTCTGGCCATGA 419 

Qy 359 AT GGAGAT AT GGATCTCAGCAT CAGT AT GACAACCT GTT CCACCGT GGCC GCCCT GGGAA 418 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 42 0 AGGGGGACATGAACCTCAGCATTGTGATGACCACCTGCTCCACCTTCTGTGCCCTTGGCA 47 9 

Qy 419 T GAT GC CACT CT GCATT T AT CT CT AC AC C T GGT C CT GGAGT CTT CAGCAGAATCT CA 47 5 

I I I I I I I I I I I I I I I I I I I I I II II I I I I I I II 

Db 480 TGATGCCTCTCCTCCTGTACATCTACTCCAGGGGGATCTATGATGGGGACCTGAAGGACA 539 

Qy 476 C CAT T C CT T AT CAGAAC AT AG GAAT T AC CCTTGTGTGCCT GAC CAT TCCTGTGGCCTTTG 535 

I II I I I I I I I I II Mill I I I I I I I I I I I I 

Db 540 AGGTGCCCTATAAAGGCATCGTGATATCACTGGTCCTGGTTCTCATTCCTTGCACCATAG 599 

Qy 536 GT GT CT ATGT GAAT T ACAGAT GGC C AAAACAATC CAAAAT CATT CT C AAGAT T GGGGC C G 595 

I I I I I I I I I I I I I I I I I I M 

Db 600 GGAT CGT CCT CAAAT CCAAACGGCCACAATACAT GCGCTAT GTCATCAAGGGAGGGAT GA 659 



Qy 



596 TTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGAT 655 



I I I III 1 1 1 1 1 1 1 1 



Db 


660 


TCATCATTCTCTTGTGCAGTGTGGCCGTCACAGTTCTCTCTGCCATCAATGTGGGGAAGA 


719 


Qy 


656 


CT T G G AAT T C AGAC AT C AC CCTTCTGACCATCAGTTTCATCTTTCCTTTGATTG 

1 1 1 1 1! 1 1 II 1 1 1 1 1 III 1 1 1 1 1 1 1 1 1 1 

GCATCATGTTTGCCATGACACCACTCTTGATTGCCACCTCCTCCCTGATGCCTTTTATTG 


709 


Db 


720 


779 


Qy 


710 


GCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGA 

Ill 1 

GCTTTCTGCTGGGTTATGTTCTCTCTGCTCTCTTCTGCCTCAATGGACGGTGCAGACGCA 


769 


Db 


780 


839 


Qy 


770 


C AAT T T C CT T AGAAACT G GAGCT C AGAAT AT T C AGAT GT G CAT CAC CAT G CT C C AGT T AT 
I | 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 III 1 1 1 II 1 1 1 1 1 1 
CT GT CAGC AT G GAGACT GGAT GC CAAAAT GT C CAACT CT GT T CC ACC AT CCT CAAT GT GG 


829 


Db 


840 


899 


Qy 


830 


CTTTCACTGCTGAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTCC 

III 1 INI III 1 III 1 1 1 1 II III Mill 

CCTTTCCACCTG7UVGTCATTGGACCACTTTTCTTCTTTCCCCTCCTCTACATGATTTTCC 


889 


Db 


900 


959 


Qy 


8 90 


AG CT GAT AGAT GGAT T T C T TAT T GT T G C AG CAT AT C AGAC GT AC AAGAGGAGAT T GAAGA 

MM 1 II 1 1 MM 1 MM III 1 II 1 II II 

AGCTTGGAGAAGGGCTTCTCCTCATTGCCATATTTTGGTGCTATGAGAAATTCAAGACTC 


949 


Db 


960 


1019 


Qy 


950 


ACAAAC AT GGAAAAAAGA 967 

III II M 1 M 1 

C C AAG GAT AAAAC AAAAA 1037 




Db 


1020 





RESULT 8 

US-10-288-222A-15 

; Sequence 15, Application US/10288222A 

; Publication No. US20030119742A1 

; GENERAL INFORMATION: 

; APPLICANT: Logan, Thomas Joseph 

; APPLICANT: Galvin, Katherine 

; APPLICANT: Chun, Miyoung 

TITLE OF INVENTION: Methods and Compositions to treat 
; TITLE OF INVENTION: Cardiovascular Disease Using 139, 258, 1261, 1486, 2398, 
2414, 7660, 8587, 

; TITLE OF INVENTION: 10183, 10550, 12680, 17921, 32248, 60489 OR 93804 

FILE REFERENCE: MPI2001-286P1R (M) 
; CURRENT APPLICATION NUMBER: US/10/28 8 , 222A 
; CURRENT FILING DATE: 2002-11-05 
; NUMBER OF SEQ ID NOS : 30 

SOFTWARE : FastSEQ for Windows Version 4.0 
; SEQ ID NO 15 

LENGTH: 1580 
TYPE: DNA 

ORGANISM: Homo Sapien 
US-10-288-222A-15 

Query Match 15.3%; Score 173.6; DB 14; Length 1580; 

Best Local Similarity 51.9%; Pred. No. 7.2e-46; 

Matches 445; Conservative 0; Mismatches 404; Indels 9; Gaps 2; 

Qy H9 TGATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGT 17 8 

I I II I I I I I I I I I I I I I I I I M M I M I I I I I I I I I I 



Db 180 T CAT GT T GT T CT T CAT CAT GCT CT C GCT GGGCT GCACC AT GGAGTT CAGCAAGAT CAAGG 239 

Qy 17 9 CGCACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGC 238 

I I M I III II M Ml II I I I I I I I I III I I I I I I 

Db 24 0 CTCACTTATGGAAGCCTAAAGGGCTGGCCATCGCCCTGGTGGCACAGTATGGCATCATGC 299 

Qy 239 CTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTG7\AGCCAGTCCAAGCTATTGCTG 2 98 

I I II II I I I MM II III I I I I I I I I I I I I I 

D b 300 CCCTCACGGCCTTTGTGCTGGGCAAGGTCTTCCGGCTGAAGAACATTGAGGCACTGGCCA 359 

Qy 2 99 TTCTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTG 358 

I I I I | | | | I I I I I I I I I I I I I I I I I I I I I I I 

D b 360 TCTTGGTCTGTGGCTGCTCACCTGGAGGGAACCTGTCCAATGTCTTCAGTCTGGCCATGA 419 

Qy 359 AT GGAGAT AT GGAT CT CAGCAT CAGT AT GACAAC CT GTT C CACC GT GGCC GC CCT GGGAA 418 

I || || Ml I II I I I I I I Mill Mill I II II I I II I M M I 

Db 42 0 AGGGGGACATGAACCTCAGCATTGTGATGACCACCTGCTCCACCTTCTGTGCCCTTGGCA 479 

Qy 419 TGATGCCACTCTGCATTTATCTCTACACC TGGTCCTGGAGTCTTCAGCAGAATCTCA 475 

I M I I II I I I I I I I II II I II M M I I I II I II 

Db 4 80 T GAT GCCT CT CCT C CT GT ACAT CT ACT C CAGGGG GAT CT AT GAT GGGGAC CT GAAGGACA 539 

Qy 476 CCATTCCTTATCAGAACATAGGAATTACCCTTGTGTGCCTGACCATTCCTGTGGCCTTTG 535 

| M I I I I MM II II II I I II M I I I I M I 

Db 540 AGGT GCC CT AT AAAGGCAT CGT GAT AT CACT GGT CCT GGTT CT CATT C CTT GCACCAT AG 599 

Qy 536 GT GT CT AT GT GAAT T AC AGAT GGC CAAAACAATC CAAAAT CATT CT CAAGAT T GGGGC C G 595 

I || I II II I II M I II II I Mill Ml 

Db 600 GGAT CGT CCT CAAAT C CAAACGGC CACAATACAT GC GCT ATGT CAT CAAGGGAGGGAT GA 659 

Qy 596 TTGTTGGTGGGGTCCTCCTTCTGGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGAT 655 

II I I I II I Ml II II II 

Db 660 TCATCATTCTCTTGTGCAGTGTGGCCGTCACAGTTCTCTCTGCCATCAATGTGGGGAAGA 719 

Qy 656 CTT GGAATT C AGAC AT C AC CCTTCTGACCATCAGTTTCATCTTTCCTTTGATTG 709 

II II I II II II I II II II M II 

Db 720 GCATCATGTTTGCCATGACACCACTCTTGATTGCCACCTCCTCCCTGATGCCTTTTATTG 779 

Qy 710 GCCATGTCACGGGTTTTCTGCTGGCACTTTTTACCCACCAGTCTTGGCAAAGGTGCAGGA 769 

I I I I II I II M 11 I II III Ml llll 

Db 780 GCTTTCTGCTGGGTTATGTTCTCTCTGCTCTCTTCTGCCTCAATGGACGGTGCAGACGCA 839 

Qy 770 CAATTT CCTT AGAAACT GGAGCT CAGAATATT CAGAT GT GCATCACCAT GCT C CAGTT AT 82 9 

I | I I I II I II II I I II M I I I II II II II I I I I I 

Db 840 CT GT CAGCAT GGAGACT GGAT GC CAAAAT GT CCAACT CT GT T C CACC AT C CT CAAT GT GG 899 

Qy 830 CTTTCACTGCTGAGCACTTGGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTCC 88 9 

Ml I II II Ml I III II II II Ml Mill 

Db 900 CCTTTCCACCTGAAGTCATTGGACCACTTTTCTTCTTTCCCCTCCTCTACATGATTTTCC 959 

Qy 890 AGCTGATAGAT GGATTT CTTATT GTT GCAGCAT AT CAGACGTACAAGAGGAGATT GAAGA 94 9 

II I I I II I I llll I MM Ml I II III II 

Db 960 AGCTT GGAGAAGG GCT T CT C CT CAT T G CCAT AT T TT G GT G CT AT GAGAAAT T CAAGACT C 1019 



Qy 

Db 



950 ACAAACAT GGAAAAAAGA 967 
III II I II II I 
1020 C CAAGGAT AAAACAAAAA 1037 



RESULT 9 

US-10-085-198-113 

Sequence 113, Application US/10085198 
Publication No. US20040009907A1 
GENERAL INFORMATION: 
APPLICANT: Alsobrook et al . 

TITLE OF INVENTION: Proteins and Nucleic Acids Encoding Same 
FILE REFERENCE: 21402-279 

CURRENT APPLICATION NUMBER: US/10/085, 198 
CURRENT FILING DATE: 2002-02-25 
PRIOR APPLICATION NUMBER: 60/271,646 
PRIOR FILING DATE: 2001-02-26 
PRIOR APPLICATION NUMBER: 60/276,401 
PRIOR FILING DATE: 2001-03-16 
PRIOR APPLICATION NUMBER: 60/311,981 
PRIOR FILING DATE: 2001-08-13 
PRIOR APPLICATION NUMBER: 60/312,858 
PRIOR FILING DATE: 2001-08-16 
PRIOR APPLICATION NUMBER: 60/271,840 
PRIOR FILING DATE: 2001-02-27 
PRIOR APPLICATION NUMBER: 60/27 7,32 4 
PRIOR FILING DATE: 2001-03-20 
PRIOR APPLICATION NUMBER: 60/286,096 
PRIOR FILING DATE: 2001-04-21 
PRIOR APPLICATION NUMBER: 60/299,695 
PRIOR FILING DATE: 2001-06-20 
PRIOR APPLICATION NUMBER: 60/315,614 
PRIOR FILING DATE: 2001-08-29 
PRIOR APPLICATION NUMBER: 60/272,405 
PRIOR FILING DATE: 2001-02-28 

Remaining Prior Application data removed - See File Wrapper or PALM. 
NUMBER OF SEQ ID NOS : 653 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 113 
LENGTH: 1988 
TYPE: DNA 

ORGANISM: Homo sapiens 
US-10-085-198-113 

Query Match 12.5%; Score 141.4; DB 15; Length 1988; 

Best Local Similarity 50.8%; Pred. No. 3.5e-35; 

Matches 451; Conservative 0; Mismatches 416; Indels 20; Gaps 4; 

Qy 88 CTGGAGCTCGTTTTCACAGTGGTGTCCACTGTGATGATGGGGCTGCTCATGTTCTCTTTG 147 

I I I I I I I I I I I I I I I I I I I I M I II 

Db 379 CTGAACCACGGGCTGAACGTGTTCGTGGGCGCCGCCCTGTGCATCACCATGCTGGGCCTG 438 

Qy 14 8 GGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAGGAGACCCTGGGGCATTGCT 207 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 439 GGCTGCACGGTGGACGTGAACCACTTCGGGGCGCACGTCCGTCGGCCCGTGGGCGCGCTG 4 98 

Qy 208 GTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGCTTATCTCCTGGCCATTAGC 267 

I II I I I I I I I I I I I I I I I I I I I I I I I Ml M I I I I I I I I 

Db 499 CTGGCAGCGCTCTGCCAGTTCGGCCTCCTGCCGCTGCTGGCCTTCCTGCTGGCCCTCGCC 558 



Qy 2 68 TTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCATGGGCTGCTGCCCGGGGGGC 327 

II Ml I II Ml I I MM 

Db 559 TTCAAGCTGGACGAGGTGGCCGCCGTGGCGGTGCTCCTGTGTGGCTGCTGTCCCGGCGGC 618 

Qy 328 ACCAT CT CTAACATTTT CACCTT CT GGGTT GAT GGAGAT AT GGAT CT CAGCAT CAGTAT G 387 

I I I M II III Ml I I I M I I I I I I I I I I I I I I I I I M III 
Db 619 AAT CTCT C CAAT CT T AT GT C C CT GCT GGTT GACGGC GACAT GAAC CT C AG CAT CAT CAT G 678 

Qy 38 8 ACAACCTGTTCCACCGTGGCCGCCCTGGGAATGATGCCACTCTGCATTTATCTCTACACC 447 

Mill Mill I M M I I II I II I I I I I I I I I I I I I I I I 

Db 679 ACCATCTCCTCCACGCTTCTGGCCCTCGTCTTGATGCCCCTGTGCCTGTGGATCTACAGC 7 38 

Qy 44 8 T GGT C CT GGAGT CT T C AGC AGAAT CT C AC CAT T C CT T AT C AGAAC AT AGGAAT T AC 503 

M I I II II I I I I I Ml i II I I I I I I I Ml 

Db 739 TGGGCTTGGA-TCAACACCCCTATCGTGCAGTTACTACCCCTAGGGACCGTGACCCTGAC 797 

Qy 504 CCTTGTGTGCCTGACCATTCCTGTGGCCTTTGGTGTCTATGTGAATTACAGATGGCCAAA 563 

M M I I I I II I I II I I II II I I I I I I I 

Db 798 TCTCTGCAGCACTCTCATACCTATCGGGTTGGGCGTCTTCATTCGCTAC7VAATACAGCCG 857 

Qy 564 ACAATCCAAAATCATTCTCAAGATTGGGGCCGTTGTTGGTGGGGTCCTCCT TCT 617 

I I I I I I I I I I I II I I I I I II I I I I Ml 

Db 858 GGTGGCTGACTACATTGTGAAGGTAAGGCCCGTTTCCCTGTGGTCTCTGCTAGTGACTCT 917 

Qy 618 GGTGGTCGCAGTTGCTGGTGTGGTCCTGGCGAAAGGATCTTGGAATTCAGACATCACCCT 677 

M II I II I M I I II I I I I I I I I 

Db 918 GGTGGTCCTTTTCATAATGACCGGCACTATGTTAGGACCTGAACTGCTGGCAAGTATCCC 977 

Qy 67 8 TCT GACCAT CAGTTT CA TCTTTCCTTTGATTGGCCATGTCACGGGTTTTCT 728 

I || Ml I I I II II I I I I I I I I I II I 

Db 978 TGCAGCTGTTTATGTGATAGCAATTTTTATGCCTTTGGCAGGCTACGCTTCAGGTTATGG 1037 

Qy 729 GCTGGCACTTTTTACCCACCAGTCTTGGCA7yVGGTGCAGGACAATTTCCTTAG7^AACTGG 788 

Ml II M I I I II I M I I I I I II II I II 

Db 1038 TT TAGCT ACT CT CTT C CAT C TT C CAC C CAACT GCAAGAG GACT GT AT GT CT GGAAACAGG 1097 

Qy 789 AGCT CAGAAT ATT CAGAT GT GCATCAC CAT GCT CCAGTT AT CTTT CACT GCT GAGCACTT 84 8 

M II M I I M I I M I I I II I I I I Ml I I I M 

Db 1098 TAGTCAGAATGTGCAGCTCTGTACAGCCATTCTAAAACTGGCCTTTCCACCGCAATTCAT 1157 

Qy 84 9 GGTCCAGATGTTGAGTTTCCCACTGGCCTATGGACTCTTCCAGCTGATAGATGGATTTCT 908 

I MM I MM II M II I II II I I II III I I 

Db 1158 AGGAAGCATGTACATGTTTCCTTTGCTGTATGCACTTTTCCAGTCTGCAGAAGCGGGGAT 1217 

Qy 909 TAT T GT T GCAGC AT AT CAGAC GT AC AAGAGGAGAT T GAAG AAC AAAC 955 

I I II I I I II I I I I II II I I I I II I I I 

Db 1218 T T T T GT T T T AAT CT AT AAAAT GT AT G GAAGT GAAAT GT T GC AC AAG C 12 64 



RESULT 10 

US-09-864-761-31375/c 

; Sequence 31375, Application US/09864761 

; Patent No. US20020048763A1 

; GENERAL INFORMATION: 

; APPLICANT: Penn, Sharron G. 

; APPLICANT: Rank f David R. 

; APPLICANT: Hanzel, David K. 



; APPLICANT: Chen, Wensheng 

; TITLE OF INVENTION: HUMAN GENOME-DERIVED SINGLE EXON NUCLEIC ACID PROBES 
USEFUL FOR 

; TITLE OF INVENTION: GENE EXPRESSION ANALYSIS BY MICROARRAY 

; FILE REFERENCE: Aeomica-X- 1 

; CURRENT APPLICATION NUMBER: US/09/864 , 761 

; CURRENT FILING DATE: 2001-05-23 

; PRIOR APPLICATION NUMBER: US 60/180,312 

; PRIOR FILING DATE: 2000-02-04 

; PRIOR APPLICATION NUMBER: US 60/207,456 

; PRIOR FILING DATE: 2000-05-26 

; PRIOR APPLICATION NUMBER: US 09/632,366 

; PRIOR FILING DATE: 2000-08-03 

PRIOR APPLICATION NUMBER: GB 242 63.6 
; PRIOR FILING DATE: 2000-10-04 
; PRIOR APPLICATION NUMBER: US 60/236,359 
; PRIOR FILING DATE: 2000-09-27 

PRIOR APPLICATION NUMBER: PCT/US01/ 00666 
; PRIOR FILING DATE: 2001-01-30 
; PRIOR APPLICATION NUMBER: PCT/US01/00667 
; PRIOR FILING DATE: 2001-01-30 
; PRIOR APPLICATION NUMBER: PCT/US01/00664 

PRIOR FILING DATE: 2001-01-30 
; PRIOR APPLICATION NUMBER: PCT/US01/00669 

PRIOR FILING DATE: 2001-01-30 
; PRIOR APPLICATION NUMBER: PCT/US01/ 00665 
; PRIOR FILING DATE: 2001-01-30 
; PRIOR APPLICATION NUMBER: PCT/US01/00668 
; PRIOR FILING DATE: 2001-01-30 
; PRIOR APPLICATION NUMBER: PCT/US01/00663 
; PRIOR FILING DATE: 2001-01-30 
; PRIOR APPLICATION NUMBER: PCT/US01/00662 

PRIOR FILING DATE: 2001-01-30 
; PRIOR APPLICATION NUMBER: PCT/US01/ 00661 
; PRIOR FILING DATE: 2001-01-30 
/ PRIOR APPLICATION NUMBER: PCT/US01/00670 
; PRIOR FILING DATE: 2001-01-30 
; PRIOR APPLICATION NUMBER: US 60/234,687 
; PRIOR FILING DATE: 2000-09-21 
; PRIOR APPLICATION NUMBER: US 09/608,408 
; PRIOR FILING DATE: 2000-06-30 
; PRIOR APPLICATION NUMBER: US 09/774,203 
; PRIOR FILING DATE: 2001-01-29 
; NUMBER OF SEQ ID NOS : 49117 

; SOFTWARE: Annomax Sequence Listing Engine vers. 1.1 
; SEQ ID NO 31375 
; LENGTH: 360 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 

OTHER INFORMATION: MAP TO AL157 789.1 

OTHER INFORMATION: EXPRESSED IN ADULT LIVER, SIGNAL =2.2 
OTHER INFORMATION: NT HIT: gill435250, EVALUE 0.00e+00 
OTHER INFORMATION : SWISSPROT HIT: Q14973, EVALUE 7.00e-64 
OTHER INFORMATION: EST_HUMAN HIT: W01479.1, EVALUE 0.00e+00 
US-09- 864-7 61-31375 



Query Match 7.0%; Score 79.8; DB 9; Length 360; 

Best Local Similarity 56.8%; Pred. No. 2.7e-15; 

Matches 147; Conservative 0; Mismatches 112; Indels 0; Gaps 0; 

Qy 119 TGATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGT 17 8 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 

Db 263 TCATGTTGTTCTTCATCATGCTCTCGCTGGGCTGCACCATGGAGTTCAGCAAGATCAAGG 204 

Qy 17 9 CGCACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGC 238 

I I I I I III II II I I I II I I I I I I I I I I I I I I I I I 

Db 203 CTCACTTATGGAAGCCTAAAGGGCTGGCCATCGCCCTGGTGGCACAGTATGGCATCATGC 144 

Qy 239 CTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTG 298 

I I II II I I I MM II III I I I I I I I III III 

Db 143 CCCTCACGGCCTTTGTGCTGGGCAAGGTCTTCCGGCTGAAGAACATTGAGGCACTGGCCA 84 

Qy 2 99 TTCTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTG 358 

I I II I I I I I I I I I I I I I I I I I I I I I I I II I I 

Db 8 3 TCTTGGTCTGTGGCTGCTCACCTGGAGGGAACCTGTCCAATGTCTTCAGTCTGGCCATGA 24 

Qy 359 AT GGAGAT AT GGATCT CAG 377 

I I I I I I I I I I I I I I 
Db 2 3 AGGG G GACAT GAAC CT CAG 5 



RESULT 11 

US-09-864-761-14847/c 

; Sequence 14847, Application US/09864761 

; Patent No. US20020048763A1 

; GENERAL INFORMATION: 

; APPLICANT: Penn, Sharron G. 

; APPLICANT: Rank, David R. 

; APPLICANT: Hanzel, David K. 

APPLICANT: Chen, Wensheng 
; TITLE OF INVENTION: HUMAN GENOME- DERIVED SINGLE EXON NUCLEIC ACID PROBES 
USEFUL FOR 

; TITLE OF INVENTION: GENE EXPRESSION ANALYSIS BY MICROARRAY 

; FILE REFERENCE: Aeomica-X-1 

; CURRENT APPLICATION NUMBER: US/09/864 , 761 

; CURRENT FILING DATE: 2001-05-23 

; PRIOR APPLICATION NUMBER: US 60/180,312 

; PRIOR FILING DATE: 2000-02-04 

PRIOR APPLICATION NUMBER: US 60/207,456 
; PRIOR FILING DATE: 2000-05-26 
; PRIOR APPLICATION NUMBER: US 09/632,366 
; PRIOR FILING DATE: 2000-08-03 
; PRIOR APPLICATION NUMBER: GB 24263.6 
; PRIOR FILING DATE: 2000-10-04 
; PRIOR APPLICATION NUMBER: US 60/236,359 
; PRIOR FILING DATE: 2000-09-27 

PRIOR APPLICATION NUMBER: PCT/US01/00666 
; PRIOR FILING DATE: 2001-01-30 
; PRIOR APPLICATION NUMBER: PCT/US01/00667 
; PRIOR FILING DATE: 2001-01-30 
; PRIOR APPLICATION NUMBER: PCT/US01/00664 
; PRIOR FILING DATE: 2001-01-30 
; PRIOR APPLICATION NUMBER: PCT/US01/ 00669 



; PRIOR FILING DATE : 2001-01-30 

; PRIOR APPLICATION NUMBER: PCT/US01/00665 

; PRIOR FILING DATE: 2001-01-30 

; PRIOR APPLICATION NUMBER: PCT/US01/00668 

; PRIOR FILING DATE: 2001-01-30 

; PRIOR APPLICATION NUMBER: PCT/US01/00663 

; PRIOR FILING DATE: 2001-01-30 

; PRIOR APPLICATION NUMBER: PCT/US01/00662 

; PRIOR FILING DATE: 2001-01-30 

PRIOR APPLICATION NUMBER: PCT/US01/00661 
; PRIOR FILING DATE: 2001-01-30 
; PRIOR APPLICATION NUMBER: PCT/US01/00670 
; PRIOR FILING DATE: 2001-01-30 
; PRIOR APPLICATION NUMBER: US 60/234,687 
; PRIOR FILING DATE: 2000-09-21 
; PRIOR APPLICATION NUMBER: US 09/608,408 
; PRIOR FILING DATE: 2000-06-30 

PRIOR APPLICATION NUMBER: US 09/774,203 

PRIOR FILING DATE: 2001-01-29 
; NUMBER OF SEQ ID NOS : 4 9117 

SOFTWARE: Annomax Sequence Listing Engine vers. 1.1 
; SEQ ID NO 14847 
LENGTH: 560 
TYPE: DNA 
; ORGANISM: Homo sapiens 
FEATURE : 

; OTHER INFORMATION: MAP TO AL157789.1 

OTHER INFORMATION: EXPRESSED IN ADULT LIVER, SIGNAL =2.2 
US-09-864-761-14847 

Query Match 7.0%; Score 79.8; DB 9; Length 560; 

Best Local Similarity 56.8%; Pred. No. 3.5e-15; 

Matches 147; Conservative 0; Mismatches 112; Indels 0; Gaps 0; 

Qy 119 TGATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGT 17 8 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 403 T CAT GT T GTT CT T CAT CAT GCT CT CGC T GG GCT GCAC CAT GGAGTT CAGCAAGAT CAAGG 344 

Qy 179 CGCACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGC 238 

I I I I I Ml II II III II I I I I I I I I I I I I I I I I I 

Db 343 CTCACTTATGGAAGCCTAAAGGGCTGGCCATCGCCCTGGTGGCACAGTATGGCATCATGC 284 

Qy 239 CTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTG 298 

I I I I I I I I I I I I I I I III I I I I I I I I ! I I I I 

Db 283 CCCTCACGGCCTTTGTGCTGGGCAAGGTCTTCCGGCTGAAGAACATTGAGGCACTGGCCA 224 

Qy 299 TTCTCATCATGGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTG 358 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 223 TCTTGGTCTGTGGCTGCTCACCTGGAGGGAACCTGTCCAATGTCTTCAGTCTGGCCATGA 164 

Qy 359 AT GGAGAT AT GGAT CT CAG 37 7 

I II II III I I I I I I 
Db 163 AGGGGGACATGAACCTCAG 14 5 



RESULT 12 

US-09-833-381-317/c 



; Sequence 317, Application US/09833381 

; Patent No. US20020132090A1 

; GENERAL INFORMATION: 

; APPLICANT: Robison, Keith E. 

; TITLE OF INVENTION: No. US20020132090Alel Nucleic Acid and Protein Homologs 
; FILE REFERENCE: 5800-119 

; CURRENT APPLICATION NUMBER: US/09/833, 381 

; CURRENT FILING DATE: 2001-04-11 

; PRIOR APPLICATION NUMBER: 09/516,448 

PRIOR FILING DATE: 2000-02-29 
; NUMBER OF SEQ ID NOS : 2 050 

SOFTWARE: FastSEQ for Windows Version 3.0 
; SEQ ID NO 317 
; LENGTH: 310 
; TYPE : DNA 
; ORGANISM: Homo sapiens 
FEATURE : 

NAME/KEY: misc_f eature 
LOCATION: (1) . . . (310) 
OTHER INFORMATION: n = A,T,C or G 
US-09-833-381-317 

Query Match 6.9%; Score 77.8; DB 9; Length 310; 

Best Local Similarity 60.2%; Pred. No. l.le-14; 

Matches 127; Conservative 0; Mismatches 84; Indels 0; Gaps 0; 

Qy 74 8 CAGT CTTGGCAAAGGT GCAGGACAATTT CCTT AGAAACT GGAGCT CAGAAT ATT CAGAT G 8 07 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 221 CAAC C CT GGTAC AGGT GC C GAACAGT AG C CT T GGAAACT GGAAT G CAGAACACT C AGCT G 162 

Qy 808 TGCATCACCATGCTCCAGTTATCTTTCACTGCTGAGCACTTGGTCCAGATGTTGAGTTTC 867 

I I I I I I I I I I I I I I I I I I I III 

Db 161 TGCTCCACCATTGTACAGCTCTCCTTCTCCCCCGAGGATCTCAACCTGGTGTTCACCTTC 102 

Qy 8 68 CCACTGGCCTATGGACTCTTCCAGCTGATAGATGGATTTCTTATTGTTGCAGCATATCAG 927 

I I I I I I I I I I I I I II I I I I III IN II Ml 

Db 101 CCACTCATCTATACTGTTTTCCAGCTCGTCTTTGCAGCAGTNATATTAGGNATTTATGTC 42 

Qy 928 ACGT ACAAGAGGAGATT GAAGAACAAACAT G 958 

I I I I I I I I II I I II III 

Db 41 AC AT AC AG GAAAT GT TAT GGAAAAAAT GAT G 11 



RESULT 13 

US-09-960-352-2253 

; Sequence 2253, Application US/09960352 

; Patent No. US20020137 139A1 

; GENERAL INFORMATION : 

; APPLICANT: Warren, Wesley C. 

; APPLICANT: Tao, Nengbing 

; APPLICANT: Byatt, John C. 

APPLICANT: Mathialagan, Nagappan 
; TITLE OF INVENTION: NUCLEIC ACID AND OTHER MOLECULES ASSOCIATED WITH 
LACTATION AND 

; TITLE OF INVENTION: MUSCLE AND FAT DEPOSITION 
; FILE REFERENCE: 16511 . 006/37-21 ( 10298 ) C 
; CURRENT APPLICATION NUMBER: US/09/960,352 



CURRENT FILING DATE: 2001-09-24 
NUMBER OF SEQ ID NOS : 15112 
SEQ ID NO 2253 
LENGTH: 401 
TYPE: DNA 

ORGANISM: Bos taurus 
FEATURE : 

NAME/ KEY: unsure 
LOCATION: (390) 

OTHER INFORMATION: unsure at all n locations 
OTHER INFORMATION: Clone ID: 10-LIB34-014-Q1-E1-C5 
US-09-960-352-2253 



Query Match 6.5%; 
Best Local Similarity 60.1%; 
Matches 122; Conservative 



Score 74; DB 9; Length 401; 
Pred. No. 2.3e-13; 
0; Mismatches 81; Indels 



0; Gaps 



0; 



Qy 



Db 



119 TGATGATGGGGCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGT 17 8 

I III I I I I I I I I I I I IMIII I I I I I I I II I I I I I I 

199 T CAT G CT GT TAAC CAT CAT GCTCTCGCTGGGTTG C AC C AT GGAGT T CAGC AAGAT CAAGG 258 



Qy 

Db 

Qy 

Db 



179 



259 



239 



319 



CGCACATCAGGAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGC 
I I I I I I I I Ml I I I III I II I I I I I I I IMMIII I I I I II 

CGCACTTCTGGAGGCCCAAGGGGCTGGCCGTCGCTCTGGTGGCGCAGTTTGGCATCATGC 



238 



318 



298 



CTTTTACAGCTTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTG 

I I I I I I I I I I I I I I III II I I I II III III 

CCCTCACTGCCTTTGGACTGGGCAAGTTCTTCCAGCTGAATAACGTTGAGGCCCTAGCCA 378 



Qy 



Db 



299 TTCTCATCATGGGCTGCTGCCCG 321 

I I I I I I I I I I I I Ml 

37 9 TCCTGATCTGCNGCTGCTCACCG 4 01 



RESULT 14 

US-09-738-626-2554 

Sequence 2554, Application US/09738626 
Publication No. US20020197605A1 
GENERAL INFORMATION: 
APPLICANT: NAKAGAWA, SATOSHI 
APPLICANT: MIZOGUCHI, HIROSHI 
APPLICANT : ANDO, SEIKO 
APPLICANT: HAYASHI, MIKIRO 
APPLICANT: OCHIAI, KEIKO 
APPLICANT: YOKOI, HARUHIKO 
APPLICANT: TATEISHI , NAOKO 
APPLICANT: SENOH, AKIHIRO 
APPLICANT: IKEDA, MASATO 
APPLICANT: OZAKI, AKIO 

TITLE OF INVENTION: NOVEL POLYNUCLEOTIDES 
FILE REFERENCE: 249-125 

CURRENT APPLICATION NUMBER: US/09/738, 626 
CURRENT FILING DATE: 2000-12-18 
PRIOR APPLICATION NUMBER: JP 99/377484 
PRIOR FILING DATE: 1999-12-16 
PRIOR APPLICATION NUMBER: JP 00/159162 
PRIOR FILING DATE: 2000-04-07 



; PRIOR APPLICATION NUMBER: JP 00/280988 
; PRIOR FILING DATE: 2000-08-03 
; NUMBER OF SEQ ID NOS : 7 059 
; SOFTWARE: Patent In ver. 3.0 
; SEQ ID NO 2554 

LENGTH: 972 

TYPE: DNA 

; ORGANISM: Corynebacterium glutamicum 
US-09-738-626-2554 

Query Match 6.0%; Score 67.6; DB 9; Length 972; 

Best Local Similarity 51.3%; Pred. No. 5.1e-ll; 

Matches 157; Conservative 0; Mismatches 149; Indels 0; Gaps 0; 

Qy 129 GCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAG 188 

II I I I I I I I I I I I II I I I I I I I I I I I I II 

Db 141 GAT CAT CAT GT T CAC C AT GGGTT T GAC CTT GAC GGT GCC C GAT T TT C AGAT GGT GCT T AA 200 

Qy 18 9 GAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGC 24 8 

IN I II I I I I I I I I I I I I I I I I I I I I I II 

Db 201 AC GT C CACT GCC TAT CT T GAT CGGT GTAGT AGC GCAGTT T GT CAT CAT GC CAT T CCT GGC 260 

Qy 249 TTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCAT 308 

I I I I I I II I I I I I I I I I II III I I I I I I I I 

Db 261 GATCGTGGTTGCGAAAATGTTCAACCTCAACCCAGCACTCGCCGTTGGCCTTCTCATGCT 32 0 

Qy 309 GGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGATGGAGATAT 368 

I I I I I I I I I I I I I I I I I I I I I I I III ! I I I I I I I 

Db 321 GGGATCCGTTCCGGGTGGCACCTCCTCCAATGTGATTGCGTTTCTCGCCCGAGGAGATGT 380 

Qy 369 GG AT CT CAG CAT C AGT AT GACAAC CT GT T C CAC CGTGGCCGCC CT GGGAAT GAT G C CACT 42 8 

I II Ml II I I I I I I I I I I I I III I I I I I I I I 

Db 381 CGCGCTATCGGTCACCATGACCTCTGTGTCCACCATTGTTTCCCCAATCATGACGCCTTT 440 

Qy 42 9 CTGCAT 434 

I III 

Db 441 CCTCAT 446 



RESULT 15 
US-09-738-626-l/c 

Sequence 1, Application US/09738626 
Publication No. US20020197605A1 
GENERAL INFORMATION: 
APPLICANT: NAKAGAWA, SATOSHI 
APPLICANT: MIZOGUCHI, HIROSHI 
APPLICANT: ANDO, SEIKO 
APPLICANT: HAYASHI, MIKIRO 
APPLICANT: OCHIAI, KEIKO 
APPLICANT: YOKOI, HARUHIKO 
APPLICANT: TATEISHI , NAOKO 
APPLICANT: SENOH, AKIHIRO 
APPLICANT: IKEDA, MASATO 
APPLICANT: OZAKI, AKIO 

TITLE OF INVENTION: NOVEL POLYNUCLEOTIDES 
FILE REFERENCE: 249-125 

CURRENT APPLICATION NUMBER: US/09/738,626 



; CURRENT FILING DATE: 2000-12-18 

; PRIOR APPLICATION NUMBER: JP 99/377484 

; PRIOR FILING DATE: 1999-12-16 

; PRIOR APPLICATION NUMBER: JP 00/159162 

; PRIOR FILING DATE: 2000-04-07 

; PRIOR APPLICATION NUMBER: JP 00/280988 

PRIOR FILING DATE: 2000-08-03 
; NUMBER OF SEQ ID NOS : 7059 

SOFTWARE: Patentln ver. 3.0 
; SEQ ID NO 1 

LENGTH: 3309400 
TYPE: DNA 

; ORGANISM: Corynebacterium glutamicum 
US-09-738-626-1 

Query Match 6.0%; Score 67.6; DB 9; Length 3309400; 

Best Local Similarity 51.3%; Pred. No. 6.1e-09; 

Matches 157; Conservative 0; Mismatches 14 9; Indels 0; Gaps 0 

Qy 129 GCTGCTCATGTTCTCTTTGGGATGTTCCGTGGAGATCCGGAAGCTGTGGTCGCACATCAG 18 8 

II I i I I I I I I I I I I I I I I I I III I I I I II 

Db 2466869 GAT CAT CAT G T T C AC CAT G G GT T T G AC C T T G AC G GT G C C C GAT T T T C AG AT G GT G C T T AA 
2466810 

Qy 189 GAGACCCTGGGGCATTGCTGTGGGACTGCTCTGCCAGTTTGGGCTCATGCCTTTTACAGC 24 8 

IN | II I I I I I I I I I II I I I I I I I I I I II 

Db 2466809 ACGTCCACTGCCTATCTTGATCGGTGTAGTAGCGCAGTTTGTCATCATGCCATTCCTGGC 

2466750 

Qy 24 9 TTATCTCCTGGCCATTAGCTTTTCTCTGAAGCCAGTCCAAGCTATTGCTGTTCTCATCAT 308 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 

Db 2466749 GATCGTGGTTGCGAAAATGTTCAACCTCAACCCAGCACTCGCCGTTGGCCTTCTCATGCT 
2466690 

Qy 309 GGGCTGCTGCCCGGGGGGCACCATCTCTAACATTTTCACCTTCTGGGTTGATGGAGATAT 368 

I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I 

Db 2466689 GGGATCCGTTCCGGGTGGCACCTCCTCCAATGTGATTGCGTTTCTCGCCCGAGGAGATGT 

2466630 

Qy 369 G GAT CT C AGC AT C AGT AT GAC AAC CT GTT C CAC CGTGGCCGCCCT G GGAAT GAT GC C ACT 42 8 

I II III I I I I I I I I I I I I I I III I I I I I I I I 

Db 24 6662 9 CGCGCTATCGGTCACCATGACCTCTGTGTCCACCATTGTTTCCCCAATCATGACGCCTTT 

2466570 

Qy 42 9 CTGCAT 434 

l lll 

Db 2466569 CCTCAT 2466564 
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Job time : 505 sees 



